On Tue, Feb 7, 2017 at 1:52 PM, Mithun Cy wrote:
> On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers wrote:
>> On 2017-02-07 18:41, Robert Haas wrote:
>>>
>>> Committed with some changes (which I noted in the commit message).
>
> Thanks, Robert and all who have reviewed the patch and given their
> valuable comments.
>
>> This has caused a warning with gcc 6.20:
>>
>> hashpage.c: In function ‘_hash_getcachedmetap’:
>> hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
>> function [-Wmaybe-uninitialized]
>> rel->rd_amcache = cache;
>> ^~~
>
> Yes, I also see the warning. I think the compiler is not able to see
> cache is always initialized and used under condition if
> (rel->rd_amcache == NULL).
> I think to make the compiler happy we can initialize the cache with
> NULL when it is defined.
Thanks for the reports and patch. Committed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Feb 7, 2017 at 11:21 PM, Erik Rijkers wrote:
> On 2017-02-07 18:41, Robert Haas wrote:
>>
>> Committed with some changes (which I noted in the commit message).
Thanks, Robert and all who have reviewed the patch and given their
valuable comments.
> This has caused a warning with gcc 6.20:
>
> hashpage.c: In function ‘_hash_getcachedmetap’:
> hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
> rel->rd_amcache = cache;
> ^~~
Yes, I also see the warning. I think the compiler is not able to see
cache is always initialized and used under condition if
(rel->rd_amcache == NULL).
I think to make the compiler happy we can initialize the cache with
NULL when it is defined.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_metap_compiler_warning01
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On 2017-02-07 18:41, Robert Haas wrote:
Committed with some changes (which I noted in the commit message).
This has caused a warning with gcc 6.20:
hashpage.c: In function ‘_hash_getcachedmetap’:
hashpage.c:1245:20: warning: ‘cache’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
rel->rd_amcache = cache;
^~~
which hopefully can be prevented...
thanks,
Erik Rijkers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Feb 1, 2017 at 12:23 AM, Michael Paquier
wrote:
> On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy wrote:
>>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
>>> force_refresh);
>>>
>>> If the cache is initialized and force_refresh is not true, then this
>>> just returns the cached data, and the metabuf argument isn't used.
>>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
>>> the metabuffer, pin and lock it, use it to set the cache, release the
>>> lock, and return with the pin still held. If *metabuf !=
>>> InvalidBuffer, we assume it's pinned and return with it still pinned.
>>
>> Thanks, Robert I have made a new patch which tries to do same. Now I
>> think code looks less complicated.
>
> Moved to CF 2017-03.
Committed with some changes (which I noted in the commit message).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Sun, Jan 29, 2017 at 6:13 PM, Mithun Cy wrote:
>> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
>> force_refresh);
>>
>> If the cache is initialized and force_refresh is not true, then this
>> just returns the cached data, and the metabuf argument isn't used.
>> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
>> the metabuffer, pin and lock it, use it to set the cache, release the
>> lock, and return with the pin still held. If *metabuf !=
>> InvalidBuffer, we assume it's pinned and return with it still pinned.
>
> Thanks, Robert I have made a new patch which tries to do same. Now I
> think code looks less complicated.
Moved to CF 2017-03.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

> HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
> force_refresh);
>
> If the cache is initialized and force_refresh is not true, then this
> just returns the cached data, and the metabuf argument isn't used.
> Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
> the metabuffer, pin and lock it, use it to set the cache, release the
> lock, and return with the pin still held. If *metabuf !=
> InvalidBuffer, we assume it's pinned and return with it still pinned.
Thanks, Robert I have made a new patch which tries to do same. Now I
think code looks less complicated.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_14.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Thu, Jan 26, 2017 at 1:48 PM, Mithun Cy wrote:
> _v11 API's was self-sustained one but it does not hold pins on the
> metapage buffer. Whereas in _v12 we hold the pin for two consecutive
> reads of metapage. I have taken your advice and producing 2 different
> patches.
Hmm. I think both of these APIs have some advantages. On the one
hand, passing metabuf sometimes allows you to avoid pin/unpin cycles -
e.g. in hashbulkdelete it makes a fair amount of sense to keep the
metabuf pinned once we've had to read it, just in case we need it
again. On the other hand, it's surprising that passing a value for
the metabuf forces the cached to be refreshed. I wonder if a good API
might be something like this:
HashMetaPage _hash_getcachedmetap(Relation rel, Buffer *metabuf, bool
force_refresh);
If the cache is initialized and force_refresh is not true, then this
just returns the cached data, and the metabuf argument isn't used.
Otherwise, if *metabuf == InvalidBuffer, we set *metabuf to point to
the metabuffer, pin and lock it, use it to set the cache, release the
lock, and return with the pin still held. If *metabuf !=
InvalidBuffer, we assume it's pinned and return with it still pinned.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Jan 24, 2017 at 3:10 PM, Amit Kapila wrote:
> 1.
> @@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,
> In the above flow, do we really need an updated metapage, can't we use
> the cached one? We are already taking care of bucket split down in
> that function.
Yes, we can use the old cached metap entry, the only reason I decided
to use the latest metapage content is because the old code used to do
that. And, cached metap is used to avoid ad-hoc local saving of same
and hence unify the cached metap API. I did not intend to save the
metapage read here which I thought will not be much useful if new
buckets are added anyway we need to read the metapage at the end. I
have taken you comments now I only read metap cache which is already
set.
> 2.
> The above two chunks of code look worse as compare to your previous
> patch. I think what we can do is keep the patch ready with both the
> versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
> and let committer take the final call.
_v11 API's was self-sustained one but it does not hold pins on the
metapage buffer. Whereas in _v12 we hold the pin for two consecutive
reads of metapage. I have taken your advice and producing 2 different
patches.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_13_donotholdpin.patch
Description: Binary data
cache_hash_index_meta_page_13_holdpin.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Jan 18, 2017 at 11:51 AM, Mithun Cy wrote:
> On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila wrote:
>
>> (b) Another somewhat bigger problem is that with this new change it
>> won't retain the pin on meta page till the end which means we might
>> need to perform an I/O again during operation to fetch the meta page.
>> AFAICS, you have just changed it so that you can call new API
>> _hash_getcachedmetap, if that's true, then I think you have to find
>> some other way of doing it. BTW, why can't you design your new API
>> such that it always take pinned metapage? You can always release the
>> pin in the caller if required. I understand that you don't always
>> need a metapage in that API, but I think the current usage of that API
>> is also not that good.
>
>
> -- Yes what you say is right. I wanted to make _hash_getcachedmetap
> API self-sufficient. But always 2 possible consecutive reads for
> metapage are connected as we pin the page to buffer to avoid I/O. Now
> redesigned the API such a way that caller pass pinned meta page if we
> want to set the metapage cache; So this gives us the flexibility to
> use the cached meta page data in different places.
>
1.
@@ -505,26 +505,22 @@ hashbulkdelete(IndexVacuumInfo *info,
IndexBulkDeleteResult *stats,
..
..
+ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+ cachedmetap = _hash_getcachedmetap(rel, metabuf);
In the above flow, do we really need an updated metapage, can't we use
the cached one? We are already taking care of bucket split down in
that function.
2.
+HashMetaPage
+_hash_getcachedmetap(Relation rel, Buffer metabuf)
+{
..
..
+ if (BufferIsInvalid(metabuf))
+ return (HashMetaPage) rel->rd_amcache;
..
+_hash_getbucketbuf_from_hashkey(Relation rel, uint32 hashkey, int access,
+ HashMetaPage *cachedmetap)
{
..
+ if (!(metap = _hash_getcachedmetap(rel, InvalidBuffer)))
+ {
+ metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
+ metap = _hash_getcachedmetap(rel, metabuf);
+ Assert(metap != NULL);
+ }
..
}
The above two chunks of code look worse as compare to your previous
patch. I think what we can do is keep the patch ready with both the
versions of _hash_getcachedmetap API (as you have in _v11 and _v12)
and let committer take the final call.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Jan 17, 2017 at 10:07 AM, Amit Kapila wrote:
> 1.
> (a) I think you can retain the previous comment or modify it slightly.
> Just removing the whole comment and replacing it with a single line
> seems like a step backward.
-- Fixed, Just modified to say it
> (b) Another somewhat bigger problem is that with this new change it
> won't retain the pin on meta page till the end which means we might
> need to perform an I/O again during operation to fetch the meta page.
> AFAICS, you have just changed it so that you can call new API
> _hash_getcachedmetap, if that's true, then I think you have to find
> some other way of doing it. BTW, why can't you design your new API
> such that it always take pinned metapage? You can always release the
> pin in the caller if required. I understand that you don't always
> need a metapage in that API, but I think the current usage of that API
> is also not that good.
-- Yes what you say is right. I wanted to make _hash_getcachedmetap
API self-sufficient. But always 2 possible consecutive reads for
metapage are connected as we pin the page to buffer to avoid I/O. Now
redesigned the API such a way that caller pass pinned meta page if we
want to set the metapage cache; So this gives us the flexibility to
use the cached meta page data in different places.
> 2.
> + if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
> + bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
> + cachedmetap = _hash_getcachedmetap(rel, true);
>
> I don't understand the meaning of above if check. It seems like you
> will update the metapage when previous block number is not a valid
> block number which will be true at the first split. How will you
> ensure that there is a re-split and cached metapage is not relevant.
> I think if there is && in the above condition, then we can ensure it.
>
-- Oops that was a mistake corrected as you stated.
> 3.
> + Given a hashkey get the target bucket page with read lock, using cached
> + metapage. The getbucketbuf_from_hashkey method below explains the same.
> +
>
> All the sentences in algorithm start with small letters, then why do
> you need an exception for this sentence. I think you don't need to
> add an empty line. Also, I think the usage of
> getbucketbuf_from_hashkey seems out of place. How about writing it
> as:
>
> The usage of cached metapage is explained later.
>
-- Fixed as like you have asked.
>
> 4.
> + If target bucket is split before metapage data was cached then we are
> + done.
> + Else first release the bucket page and then update the metapage cache
> + with latest metapage data.
>
> I think it is better to retain original text of readme and add about
> meta page update.
>
-- Fixed. Now where ever it is meaning full I have kept original
wordings. But, the way we get to right target buffer after the latest
split is slightly different than what the original code did. So there
is a slight modification to show we use metapage cache.
> 5.
> + Loop:
> ..
> ..
> + Loop again to reach the new target bucket.
>
> No need to write "Loop again ..", that is implicit.
>
-- Fixed as liked you have asked.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_12.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Fri, Jan 13, 2017 at 9:58 AM, Mithun Cy wrote:
> On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila wrote:
Below are review comments on latest version of patch.
1.
/*
- * Read the metapage to fetch original bucket and tuple counts. Also, we
- * keep a copy of the last-seen metapage so that we can use its
- * hashm_spares[] values to compute bucket page addresses. This is a bit
- * hokey but perfectly safe, since the interesting entries in the spares
- * array cannot change under us; and it beats rereading the metapage for
- * each bucket.
+ * update and get the metapage cache data.
*/
- metabuf = _hash_getbuf(rel, HASH_METAPAGE, HASH_READ, LH_META_PAGE);
- metap = HashPageGetMeta(BufferGetPage(metabuf));
- orig_maxbucket = metap->hashm_maxbucket;
- orig_ntuples = metap->hashm_ntuples;
- memcpy(_metapage, metap, sizeof(local_metapage));
- /* release the lock, but keep pin */
- LockBuffer(metabuf, BUFFER_LOCK_UNLOCK);
+ cachedmetap = _hash_getcachedmetap(rel, true);
+ orig_maxbucket = cachedmetap->hashm_maxbucket;
+ orig_ntuples = cachedmetap->hashm_ntuples;
(a) I think you can retain the previous comment or modify it slightly.
Just removing the whole comment and replacing it with a single line
seems like a step backward.
(b) Another somewhat bigger problem is that with this new change it
won't retain the pin on meta page till the end which means we might
need to perform an I/O again during operation to fetch the meta page.
AFAICS, you have just changed it so that you can call new API
_hash_getcachedmetap, if that's true, then I think you have to find
some other way of doing it. BTW, why can't you design your new API
such that it always take pinned metapage? You can always release the
pin in the caller if required. I understand that you don't always
need a metapage in that API, but I think the current usage of that API
is also not that good.
2.
+ if (bucket_opaque->hasho_prevblkno != InvalidBlockNumber ||
+ bucket_opaque->hasho_prevblkno > cachedmetap->hashm_maxbucket)
+ cachedmetap = _hash_getcachedmetap(rel, true);
I don't understand the meaning of above if check. It seems like you
will update the metapage when previous block number is not a valid
block number which will be true at the first split. How will you
ensure that there is a re-split and cached metapage is not relevant.
I think if there is && in the above condition, then we can ensure it.
3.
+ Given a hashkey get the target bucket page with read lock, using cached
+ metapage. The getbucketbuf_from_hashkey method below explains the same.
+
All the sentences in algorithm start with small letters, then why do
you need an exception for this sentence. I think you don't need to
add an empty line. Also, I think the usage of
getbucketbuf_from_hashkey seems out of place. How about writing it
as:
The usage of cached metapage is explained later.
4.
+ If target bucket is split before metapage data was cached then we are
+ done.
+ Else first release the bucket page and then update the metapage cache
+ with latest metapage data.
I think it is better to retain original text of readme and add about
meta page update.
5.
+ Loop:
..
..
+ Loop again to reach the new target bucket.
No need to write "Loop again ..", that is implicit.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Jan 11, 2017 at 12:46 AM, Robert Haas wrote:
> Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
> this new logic? Or at least update the comments?
I have introduced a new function _hash_getcachedmetap in patch 11 [1] with
this hashbulkdelete() can use metapage cache instead of saving it locally.
[1] cache_hash_index_meta_page_11.patch
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

On Fri, Jan 6, 2017 at 11:43 AM, Amit Kapila wrote:
> Few more comments:
> 1.
> no need to two extra lines, one is sufficient and matches the nearby
> coding pattern.
-- Fixed.
> 2.
> Do you see anywhere else in the code the pattern of using @ symbol in
> comments before function name?
-- Fixed.
> 3.
>
> after this change, i think you can directly use bucket in
> _hash_finish_split instead of pageopaque->hasho_bucket
-- Fixed.
> 4.
>
> Won't the check similar to the existing check (if (*bufp ==
> so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
> above this code will suffice the need? If so, then you can check it
> once and use it in both places.
>
-- Fixed.
> 5. The reader and insertion algorithm needs to be updated in README.
-- Added info in README.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_11.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Dec 27, 2016 at 3:06 AM, Mithun Cy wrote:
> Thanks, just like _bt_getroot I am introducing a new function
> _hash_getbucketbuf_from_hashkey() which will give the target bucket
> buffer for the given hashkey. This will use the cached metapage
> contents instead of reading meta page buffer if cached data is valid.
> There are 2 places which can use this service 1. _hash_first and 2.
> _hash_doinsert.
Can we adapt the ad-hoc caching logic in hashbulkdelete() to work with
this new logic? Or at least update the comments?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Thu, Jan 5, 2017 at 11:38 AM, Mithun Cy wrote:
> On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy wrote:
> I have re-based the patch to fix one compilation warning
> @_hash_doinsert where variable bucket was only used for Asserting but
> was not declared about its purpose.
>
Few more comments:
1.
}
+
+
+/*
+ * _hash_getbucketbuf_from_hashkey() -- Get the bucket's buffer for the given
no need to two extra lines, one is sufficient and matches the nearby
coding pattern.
2.
+ * old bucket in case of bucket split, see @_hash_doinsert().
Do you see anywhere else in the code the pattern of using @ symbol in
comments before function name? In general, there is no harm in using
it, but maybe it is better to be consistent with usage at other
places.
3.
+ /*
+ * @_hash_getbucketbuf_from_hashkey we have verified the hasho_bucket.
+ * Should be safe to use further.
+ */
+ bucket = pageopaque->hasho_bucket;
/*
* If this bucket is in the process of being split, try to finish the
@@ -152,7 +107,9 @@ restart_insert:
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
- maxbucket, highmask, lowmask);
+ usedmetap->hashm_maxbucket,
after this change, i think you can directly use bucket in
_hash_finish_split instead of pageopaque->hasho_bucket
4.
@@ -154,8 +154,11 @@ _hash_readprev(IndexScanDesc scan,
*bufp = InvalidBuffer;
/* check for interrupts while we're not holding any buffer lock */
CHECK_FOR_INTERRUPTS();
- if (BlockNumberIsValid(blkno))
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
Won't the check similar to the existing check (if (*bufp ==
so->hashso_bucket_buf || *bufp == so->hashso_split_bucket_buf)) just
above this code will suffice the need? If so, then you can check it
once and use it in both places.
5. The reader and insertion algorithm needs to be updated in README.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Jan 4, 2017 at 5:21 PM, Mithun Cy wrote:
I have re-based the patch to fix one compilation warning
@_hash_doinsert where variable bucket was only used for Asserting but
was not declared about its purpose.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_10.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Wed, Jan 4, 2017 at 4:19 PM, Mithun Cy wrote:
> As part performance/space analysis of hash index on varchar data typevarchar
> data type
> with this patch, I have run some tests for same with modified pgbench.with
> this patch, I have run some tests for same with modified pgbench.
I forgot to mention all these tests were run on power2 machine which
has 192 2 hyperthreads.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Jan 3, 2017 at 12:05 PM, Mithun Cy wrote:
As part performance/space analysis of hash index on varchar data type
with this patch, I have run some tests for same with modified pgbench.
Modification includes:
1. Changed aid column of pg_accounts table from int to varchar(x)
2. To generate unique values as before inserted stringified integer
repeatedly to fill x.
I have mainly tested for varchar(30) and varchar(90),
Below document has the detailed report on our captured data. New hash
indexes have some ~25% improved performance over btree. And, as
expected very space efficient.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
hash_index_sizes_experiment_01.ods
Description: application/vnd.oasis.opendocument.spreadsheet
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thanks Amit for detailed review, and pointing out various issues in
the patch. I have tried to fix all of your comments as below.
On Mon, Jan 2, 2017 at 11:29 AM, Amit Kapila wrote:
> 1.
> usage "into to .." in above comment seems to be wrong.usage "into to .." in
> above comment seems to be wrong.usage "into to .." in above comment seems to
> be wrong.usage "into to .." in above comment seems to be wrong.
-- Fixed.
> 2.
> In the above comment, a reference to HashMetaCache is confusing, are
> your referring to any structure here? If you change this, consider toyour
> referring to any structure here? If you change this, consider toyour
> referring to any structure here? If you change this, consider toyour
> referring to any structure here? If you change this, consider to
> change the similar usage at other places in the patch as well.change the
> similar usage at other places in the patch as well.change the similar usage
> at other places in the patch as well.change the similar usage at other places
> in the patch as well.
-- Fixed. Removed HashMetaCache everywhere in the code. Where ever
needed added HashMetaPageData.
> Also, it is not clear to what do you mean by ".. to indicate bucketto
> indicate bucket
> has been initialized .."? assigning maxbucket as a special value tohas been
> initialized .."? assigning maxbucket as a special value to
> prevblkno is not related to initializing a bucket page.prevblkno is not
> related to initializing a bucket page.
-- To be consistent with our definition of prevblkno, I am setting
prevblkno with current hashm_maxbucket when we initialize the bucket
page. I have tried to correct the comments accordingly.
> 3.
> In above comment, where you are saying ".. caching the some of the
> meta page data .." is slightly confusing, because it appears to me
> that you are caching whole of the metapage not some part of it.
-- Fixed. Changed to caching the HashMetaPageData.
> 4.
> +_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,
>
> Generally, for _hash_* API's, we use rel as the first parameter, so I
> think it is better to maintain the same with this API as well.
-- Fixed.
> 5.
> _hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
> - maxbucket, highmask, lowmask);
> + usedmetap->hashm_maxbucket,
> + usedmetap->hashm_highmask,
> + usedmetap->hashm_lowmask);
> I think we should add an Assert for the validity of usedmetap before using it.
-- Fixed. Added Assert(usedmetap != NULL);
> 6. Getting few compilation errors in assert-enabled build.
>
-- Fixed. Sorry, I missed handling bucket number which is needed at
below codes. I have fixed same now.
> 7.
> I can understand what you want to say in above comment, but I think
> you can write it in somewhat shorter form. There is no need to
> explain the exact check.
-- Fixed. I have tried to compress it into a few lines.
> 8.
> @@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
> _hash_relbuf(rel, *bufp);
>
> *bufp = InvalidBuffer;
> +
> + /* If it is a bucket page there will not be a prevblkno. */
> + if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
> + return;
> +
>
> I don't think above check is right. Even if it is a bucket page, we
> might need to scan bucket being populated, refer check else if
> (so->hashso_buc_populated && so->hashso_buc_split). Apart from that,
> you can't access bucket page after releasing the lock on it. Why have
> you added such a check?
>
-- Fixed. That was a mistake, now I have fixed it. Actually, if bucket
page is passed to _hash_readprev then there will not be a prevblkno.
But from this patch onwards prevblkno of bucket page will store
hashm_maxbucket. So a check BlockNumberIsValid (blkno) will not be
valid anymore. I have fixed by adding as below.
+ /* If it is a bucket page there will not be a prevblkno. */
+ if (!((*opaquep)->hasho_flag & LH_BUCKET_PAGE))
{
+ Assert(BlockNumberIsValid(blkno));
There are 2 other places which does same @_hash_freeovflpage,
@_hash_squeezebucket.
But that will only be called for overflow pages. So I did not make
changes. But I think we should also change there to make it
consistent.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_09.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Dec 27, 2016 at 1:36 PM, Mithun Cy wrote:
> On Thu, Dec 22, 2016 at 12:17 PM, Amit Kapila wrote:
>> On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas wrote:
>>> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy
>>> wrote:
-- I think if it is okay, I can document same for each member of
HashMetaPageData whether to read from cached from meta page or directly
from current meta page. Below briefly I have commented for each member. If
you suggest I can go with that approach, I will produce a neat patch for
same.
>>>
>>> Plain text emails are preferred on this list.
>
> Sorry, I have set the mail to plain text mode now.
>
>> I think this will make metpage cache access somewhat
>> similar to what we have in btree where we use cache to access
>> rootpage. Will something like that address your concern?
>
> Thanks, just like _bt_getroot I am introducing a new function
> _hash_getbucketbuf_from_hashkey() which will give the target bucket
> buffer for the given hashkey. This will use the cached metapage
> contents instead of reading meta page buffer if cached data is valid.
> There are 2 places which can use this service 1. _hash_first and 2.
> _hash_doinsert.
>
This version of the patch looks much better than the previous version.
I have few review comments:
1.
+ * pin so we can use the metabuf while writing into to it below.
+ */
+ metabuf =
_hash_getbuf(rel, HASH_METAPAGE, HASH_NOLOCK, LH_META_PAGE);
usage "into to .." in above comment seems to be wrong.
2.
- pageopaque->hasho_prevblkno = InvalidBlockNumber;
+
+ /*
+ *
Setting hasho_prevblkno of bucket page with latest maxbucket number
+ * to indicate
bucket has been initialized and need to reconstruct
+ * HashMetaCache if it is older.
+
*/
+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
In the above comment, a reference to HashMetaCache is confusing, are
your referring to any structure here? If you change this, consider to
change the similar usage at other places in the patch as well.
Also, it is not clear to what do you mean by ".. to indicate bucket
has been initialized .."? assigning maxbucket as a special value to
prevblkno is not related to initializing a bucket page.
3.
typedef struct HashPageOpaqueData
{
+ /*
+ * If this is an ovfl page this stores previous ovfl
(or bucket) blkno.
+ * Else if this is a bucket page we use this for a special purpose. We
+ *
store hashm_maxbucket value, whenever this page is initialized or
+ * split. So this helps us
to know whether the bucket has been split after
+ * caching the some of the meta page data.
See _hash_doinsert(),
+ * _hash_first() to know how to use same.
+ */
In above comment, where you are saying ".. caching the some of the
meta page data .." is slightly confusing, because it appears to me
that you are caching whole of the metapage not some part of it.
4.
+_hash_getbucketbuf_from_hashkey(uint32 hashkey, Relation rel,
Generally, for _hash_* API's, we use rel as the first parameter, so I
think it is better to maintain the same with this API as well.
5.
_hash_finish_split(rel, metabuf, buf, pageopaque->hasho_bucket,
- maxbucket, highmask, lowmask);
+ usedmetap->hashm_maxbucket,
+ usedmetap->hashm_highmask,
+ usedmetap->hashm_lowmask);
I think we should add an Assert for the validity of usedmetap before using it.
6. Getting few compilation errors in assert-enabled build.
1>src/backend/access/hash/hashinsert.c(85): error C2065: 'bucket' :
undeclared identifier
1>src/backend/access/hash/hashinsert.c(155): error C2065: 'bucket' :
undeclared identifier
1>src/backend/access/hash/hashsearch.c(223): warning C4101: 'blkno' :
unreferenced local variable
1> hashutil.c
1>\src\backend\access\hash\hashsearch.c(284): warning C4700:
uninitialized local variable 'bucket' used
7.
+ /*
+ * Check if this bucket is split after we have cached the hash meta
+ * data. To do this we need to check whether cached maxbucket number
+ * is less than or equal to maxbucket number stored in bucket page,
+ * which was set with that times maxbucket number during bucket page
+ * splits. In case of upgrade hashno_prevblkno of old bucket page will
+ * be set with InvalidBlockNumber. And as of now maximum value the
+ * hashm_maxbucket can take is 1 less than InvalidBlockNumber (see
+ * _hash_expandtable). So an explicit check for InvalidBlockNumber in
+ * hasho_prevblkno will tell whether current bucket has been split
+ * after caching hash meta data.
+ */
I can understand what you want to say in above comment, but I think
you can write it in somewhat shorter form. There is no need to
explain the exact check.
8.
@@ -152,6 +152,11 @@ _hash_readprev(IndexScanDesc scan,
_hash_relbuf(rel, *bufp);
*bufp = InvalidBuffer;
+
+ /* If it is a bucket page there will not be a prevblkno. */
+ if ((*opaquep)->hasho_flag & LH_BUCKET_PAGE)
+ return;
+
I don't think above check is

On Wed, Dec 21, 2016 at 9:26 PM, Robert Haas wrote:
> On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy wrote:
>> -- I think if it is okay, I can document same for each member of
>> HashMetaPageData whether to read from cached from meta page or directly from
>> current meta page. Below briefly I have commented for each member. If you
>> suggest I can go with that approach, I will produce a neat patch for same.
>
> Plain text emails are preferred on this list.
>
> I don't have any confidence in this approach. I'm not sure exactly
> what needs to be changed here, but what you're doing right now is just
> too error-prone. There's a cached metapage available, and you've got
> code accessing directly, and that's OK except when it's not, and maybe
> we can add some comments to explain, but I don't think that's going to
> be good enough to really make it clear and maintainable. We need some
> kind of more substantive safeguard to prevent the cached metapage data
> from being used in unsafe ways -- and while we're at it, we should try
> to use it in as many of the places where it *is* safe as possible. My
> suggestion for a separate structure was one idea; another might be
> providing some kind of API that's always used to access the metapage
> cache. Or maybe there's a third option.
>
This metapage cache can be validated only when we have a bucket in
which we have stored the maxbucket value. I think what we can do to
localize the use of metapage cache is to write a new API which will
return a bucket page locked in specified mode based on hashkey.
Something like Buffer _hash_get_buc_buffer_from_hashkey(hashkey,
lockmode). I think this will make metpage cache access somewhat
similar to what we have in btree where we use cache to access
rootpage. Will something like that address your concern?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Dec 20, 2016 at 2:25 PM, Mithun Cy wrote:
> -- I think if it is okay, I can document same for each member of
> HashMetaPageData whether to read from cached from meta page or directly from
> current meta page. Below briefly I have commented for each member. If you
> suggest I can go with that approach, I will produce a neat patch for same.
Plain text emails are preferred on this list.
I don't have any confidence in this approach. I'm not sure exactly
what needs to be changed here, but what you're doing right now is just
too error-prone. There's a cached metapage available, and you've got
code accessing directly, and that's OK except when it's not, and maybe
we can add some comments to explain, but I don't think that's going to
be good enough to really make it clear and maintainable. We need some
kind of more substantive safeguard to prevent the cached metapage data
from being used in unsafe ways -- and while we're at it, we should try
to use it in as many of the places where it *is* safe as possible. My
suggestion for a separate structure was one idea; another might be
providing some kind of API that's always used to access the metapage
cache. Or maybe there's a third option. But this, the way it is
right now, is just too ad-hoc, even with more comments. IMHO, anyway.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Dec 20, 2016 at 3:51 AM, Robert Haas wrote:
> On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy
> wrote:
>
>> Shouldn't _hash_doinsert() be using the cache, too
>>>
>>
>> Yes, we have an opportunity there, added same in code. But one difference
>> is at the end at-least once we need to read the meta page to split and/or
>> write. Performance improvement might not be as much as read-only.
>>
>
> Why would you need to read it at least once in one case but not the other?
>
For read-only: in _hash_first if target bucket is not plit after caching
the meta page contents. we never need to read the metapage. But for insert:
in _hash_doinsert at the end we modify the meta page to store the number of
tuples.
+ * Write-lock the metapage so we can increment the tuple count. After
+ * incrementing it, check to see if it's time for a split.
+ */
+ _hash_chgbufaccess(rel, metabuf, HASH_NOLOCK, HASH_WRITE);
+
+ metap->hashm_ntuples += 1;
Well, I guess that makes it more appealing to cache the whole page at least
> in terms of raw number of bytes, but I suppose my real complaint here is
> that there don't seem to be any clear rules for where, whether, and to what
> extent we can rely on the cache to be valid.
>
--- Okay will revert back to cache the entire meta page.
> Without that, I think this patch is creating an extremely serious
> maintenance hazard for anyone who wants to try to modify this code in the
> future. A future patch author needs to be able to tell what data they can
> use and what data they can't use, and why.
>
-- I think if it is okay, I can document same for each member
of HashMetaPageData whether to read from cached from meta page or directly
from current meta page. Below briefly I have commented for each member. If
you suggest I can go with that approach, I will produce a neat patch for
same.
*typedef struct HashMetaPageData*
*{*
*1. uint32 hashm_magic; /* magic no. for hash tables */*
-- I think this should remain same. So can be read from catched meta page.
*2. uint32 hashm_version; /* version ID */*
-- This is one time initied, never changed afterwards. So can be read from
catched metapage.
*3. double hashm_ntuples; /* number of tuples stored in the table */*
*-*- This changes on every insert. So should not be read from chached data.
*4. uint16 hashm_ffactor; /* target fill factor (tuples/bucket) */*
-- This is one time initied, never changed afterwards. So can be read from
catched metapage.
*5. uint16 hashm_bsize; /* index page size (bytes) */*
-- This is one time initied, never changed afterwards. So can be read from
catched metapage.
*6. uint16 hashm_bmsize; /* bitmap array size (bytes) - must be a power of
2 */*
-- This is one time initied, never changed afterwards. So can be read from
catched metapage
*7. uint16 hashm_bmshift; /* log2(bitmap array size in BITS) */*
-- This is one time initied, never changed afterwards. So can be read from
catched metapage
*8. *If hashm_maxbucket, hashm_highmask and hashm_lowmask are all read and
cached at same time when metapage was locked, then key to bucket number map
function _hash_hashkey2bucket() should always produce same output. If
bucket is split after caching above elements (which can be known because
old bucket pages will never move once allocated and we mark bucket
pages hasho_prevblkno with incremented hashm_highmask), we can invalidate
them and re-read same from meta page. If your intention is not to save a
metapage read while trying to map the key to buket page, then do not read
them from cached meta page.
* uint32 hashm_maxbucket; /* ID of maximum bucket in use */*
* uint32 hashm_highmask; /* mask to modulo into entire table */*
* uint32 hashm_lowmask; /* mask to modulo into lower half of table */*
*9.*
* uint32 hashm_ovflpoint;/* splitpoint from which ovflpgs being*
* * allocated */*
-- Since used for allocation of overflow pages, should get latest value
directly from meta page.
*10.*
* uint32 hashm_firstfree; /* lowest-number free ovflpage (bit#) */*
-- Should always be read from metapage directly.
*11. *
* uint32 hashm_nmaps; /* number of bitmap pages */*
-- Should always be read from metapage directly.
*12.*
*RegProcedure hashm_procid; /* hash procedure id from pg_proc */*
-- Never used till now, When we use, need to look at it about its policy.
*13*
*uint32 hashm_spares[HASH_MAX_SPLITPOINTS]; /* spare pages before*
* * each splitpoint */*
Spares before given split_point never change and bucket pages never move.
So when combined with cached hashm_maxbucket, hashm_highmask
and hashm_lowmask, all read at same time under lock, BUCKET_TO_BLKNO should
always produce same output, pointing to right physical block. Should only
be used to save a meta page read when we want to map key to bucket block as
said above.
*14.*
* BlockNumber hashm_mapp[HASH_MAX_BITMAPS]; /* blknos of ovfl bitmaps */*
Always read from metapage durectly.
*}

On Fri, Dec 16, 2016 at 5:16 AM, Mithun Cy
wrote:
> Shouldn't _hash_doinsert() be using the cache, too
>>
>
> Yes, we have an opportunity there, added same in code. But one difference
> is at the end at-least once we need to read the meta page to split and/or
> write. Performance improvement might not be as much as read-only.
>
Why would you need to read it at least once in one case but not the other?
>
>> I think it's probably not a good idea to cache the entire metapage. The
>> only things that you are "really" trying to cache, I think, are
>> hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
>> HashPageMetaData structure is 696 bytes on my machine, and it doesn't
>> really make sense to copy the whole thing into memory if you only need 16
>> bytes of it. It could even be dangerous -- if somebody tries to rely on
>> the cache for some other bit of data and we're not really guaranteeing that
>> it's fresh enough for that.
>>
>> I'd suggest defining a new structure HashMetaDataCache with members
>> hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
>> comment explaining that we only care about having the data be fresh enough
>> to test whether the bucket mapping we computed for a tuple is still
>> correct, and that for that to be the case we only need to know whether a
>> bucket has suffered a new split since we last refreshed the cache.
>>
>
> It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3
> uint32s) but we also need
>
> *uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to
> block mapping in "BUCKET_TO_BLKNO(metap, bucket)".
>
> Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) =
> 35*4 = 140 bytes.
>
Well, I guess that makes it more appealing to cache the whole page at least
in terms of raw number of bytes, but I suppose my real complaint here is
that there don't seem to be any clear rules for where, whether, and to what
extent we can rely on the cache to be valid. Without that, I think this
patch is creating an extremely serious maintenance hazard for anyone who
wants to try to modify this code in the future. A future patch author
needs to be able to tell what data they can use and what data they can't
use, and why.
Apart from this, there seems to be some base bug in _hash_doinsert().
> +* XXX this is useless code if we are only storing hash keys.
>
> + */
>
> +if (itemsz > HashMaxItemSize((Page) metap))
>
> +ereport(ERROR,
>
> +(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>
> + errmsg("index row size %zu exceeds hash maximum %zu",
>
> +itemsz, HashMaxItemSize((Page) metap)),
>
> + errhint("Values larger than a buffer page cannot be
> indexed.")));
>
> "metap" (HashMetaPage) and Page are different data structure their member
> types are not in sync, so should not typecast blindly as above. I think we
> should remove this part of the code as we only store hash keys. So I have
> removed same but kept the assert below as it is.
>
Any existing bugs should be the subject of a separate patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Thanks Robert, I have tried to address all of the comments,
On Tue, Dec 6, 2016 at 2:20 AM, Robert Haas wrote:
>
> +bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
>metap->hashm_highmask,
>metap->hashm_lowmask);
>
> This hunk appears useless.
>
> +metap = (HashMetaPage)rel->rd_amcache;
>
> Whitespace.
>
Fixed.
>
> +/* Cache the metapage data for next time*/
>
> Whitespace.
>
Fixed.
> +/* Check if this bucket is split after we have cached the
> metapage.
>
> Whitespace.
>
Fixed.
>
> Shouldn't _hash_doinsert() be using the cache, too?
>
Yes, we have an opportunity there, added same in code. But one difference
is at the end at-least once we need to read the meta page to split and/or
write. Performance improvement might not be as much as read-only.
I did some pgbench simple-update tests for same, with below changes.
- "alter table pgbench_branches add primary key (bid)",
- "alter table pgbench_tellers add primary key (tid)",
- "alter table pgbench_accounts add primary key (aid)"
+ "create index pgbench_branches_bid on pgbench_branches
using hash (bid)",
+ "create index pgbench_tellers_tid on pgbench_tellers using
hash (tid)",
+ "create index pgbench_accounts_aid on pgbench_accounts
using hash (aid)"
And, removed all reference keys. But I see no improvements; I will further
do benchmarking for copy command and report same.
Clients
After Meta page cache
Base Code
%imp
1
2276.151633
2304.253611
-1.2195696631
32
36816.596513
36439.552652
1.0347104549
64
50943.763133
51005.236973
-0.120524565
128
49156.980457
48458.275106
1.4418700407
Above result is median of three runs, and each run is for 30mins.
*Postgres Server settings:*
./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
checkpoint_completion_target=0.9
*pgbench settings:*
scale_factor = 300 (so database fits in shared_buffer)
Mode = -M prepared -N (prepared simple-update).
*Machine used:*
power2 same as described as above.
> I think it's probably not a good idea to cache the entire metapage. The
> only things that you are "really" trying to cache, I think, are
> hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
> HashPageMetaData structure is 696 bytes on my machine, and it doesn't
> really make sense to copy the whole thing into memory if you only need 16
> bytes of it. It could even be dangerous -- if somebody tries to rely on
> the cache for some other bit of data and we're not really guaranteeing that
> it's fresh enough for that.
>
> I'd suggest defining a new structure HashMetaDataCache with members
> hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
> comment explaining that we only care about having the data be fresh enough
> to test whether the bucket mapping we computed for a tuple is still
> correct, and that for that to be the case we only need to know whether a
> bucket has suffered a new split since we last refreshed the cache.
>
It is not only hashm_maxbucket, hashm_lowmask, and hashm_highmask (3
uint32s) but we also need
*uint32 hashm_spares[HASH_MAX_SPLITPOINTS],* for bucket number to
block mapping in "BUCKET_TO_BLKNO(metap, bucket)".
Note : #define HASH_MAX_SPLITPOINTS 32, so it is (3*uint32 + 32*uint32) =
35*4 = 140 bytes.
>
> The comments in this patch need some work, e.g.:
>
> -
> + oopaque->hasho_prevblkno = maxbucket;
>
> No comment?
>
>
I have tried to improve commenting part in the new patch.
Apart from this, there seems to be some base bug in _hash_doinsert().
+* XXX this is useless code if we are only storing hash keys.
+ */
+if (itemsz > HashMaxItemSize((Page) metap))
+ereport(ERROR,
+(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+ errmsg("index row size %zu exceeds hash maximum %zu",
+itemsz, HashMaxItemSize((Page) metap)),
+ errhint("Values larger than a buffer page cannot be
indexed.")));
"metap" (HashMetaPage) and Page are different data structure their member
types are not in sync, so should not typecast blindly as above. I think we
should remove this part of the code as we only store hash keys. So I have
removed same but kept the assert below as it is.
Also, there was a bug in the previous patch. I was not releasing the bucket
page lock if cached metadata is old, now same is fixed.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com
cache_hash_index_meta_page_07.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Mon, Dec 5, 2016 at 2:58 PM, Mithun Cy
wrote:
> On Thu, Dec 1, 2016 at 8:10 PM, Jesper Pedersen <
> jesper.peder...@redhat.com> wrote:
> >As the concurrent hash index patch was committed in 6d46f4 this patch
> needs a rebase.
>
> Thanks Jesper,
>
> Adding the rebased patch.
>
-bucket = _hash_hashkey2bucket(hashkey,
- metap->hashm_maxbucket,
+bucket = _hash_hashkey2bucket(hashkey, metap->hashm_maxbucket,
metap->hashm_highmask,
metap->hashm_lowmask);
This hunk appears useless.
+metap = (HashMetaPage)rel->rd_amcache;
Whitespace.
+/* Cache the metapage data for next time*/
Whitespace.
+/* Check if this bucket is split after we have cached the metapage.
Whitespace.
Shouldn't _hash_doinsert() be using the cache, too?
I think it's probably not a good idea to cache the entire metapage. The
only things that you are "really" trying to cache, I think, are
hashm_maxbucket, hashm_lowmask, and hashm_highmask. The entire
HashPageMetaData structure is 696 bytes on my machine, and it doesn't
really make sense to copy the whole thing into memory if you only need 16
bytes of it. It could even be dangerous -- if somebody tries to rely on
the cache for some other bit of data and we're not really guaranteeing that
it's fresh enough for that.
I'd suggest defining a new structure HashMetaDataCache with members
hmc_maxbucket, hmc_lowmask, and hmc_highmask. The structure can have a
comment explaining that we only care about having the data be fresh enough
to test whether the bucket mapping we computed for a tuple is still
correct, and that for that to be the case we only need to know whether a
bucket has suffered a new split since we last refreshed the cache.
The comments in this patch need some work, e.g.:
-
+ oopaque->hasho_prevblkno = maxbucket;
No comment?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

On 09/28/2016 11:55 AM, Mithun Cy wrote:
On Tue, Sep 27, 2016 at 1:53 AM, Jeff Janes wrote:
> I think that this needs to be updated again for v8 of concurrent and v5
of wal
Adding the rebased patch over [1] + [2]
As the concurrent hash index patch was committed in 6d46f4 this patch
needs a rebase.
I have moved this submission to the next CF.
Thanks for working on this !
Best regards,
Jesper
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Oct 4, 2016 at 11:55 PM, Jeff Janes wrote:
>Can you describe your benchmarking machine? Your benchmarking data went
up to 128 clients. But how many cores does the machine have? Are
>you testing how well it can use the resources it has, or how well it can
deal with oversubscription of the resources?
It is a power2 machine with 192 hyperthreads.
Architecture: ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list: 0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s): 4
Model: IBM,8286-42A
L1d cache: 64K
L1i cache: 32K
L2 cache: 512K
L3 cache: 8192K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
NUMA node2 CPU(s): 96-143
NUMA node3 CPU(s): 144-191
>Also, was the file supposed to be named .ods? I didn't find it to be
openable as an .odc file.
Yes .ods right it is a spreadsheet in ODF.
--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy
wrote:
> I have created a patch to cache the meta page of Hash index in
> backend-private memory. This is to save reading the meta page buffer every
> time when we want to find the bucket page. In “_hash_first” call, we try
> to read meta page buffer twice just to make sure bucket is not split after
> we found bucket page. With this patch meta page buffer read is not done, if
> the bucket is not split after caching the meta page.
>
> Idea is to cache the Meta page data in rd_amcache and store maxbucket
> number in hasho_prevblkno of bucket primary page (which will always be NULL
> other wise, so reusing it here for this cause!!!). So when we try to do
> hash lookup for bucket page if locally cached maxbucket number is greater
> than or equal to bucket page's maxbucket number then we can say given
> bucket is not split after we have cached the meta page. Hence avoid reading
> meta page buffer.
>
> I have attached the benchmark results and perf stats (refer
> hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
> Benchmark results). There we can see improvements at higher clients, as
> lwlock contentions due to buffer read are more at higher clients. If I
> apply the same patch on Amit's concurrent hash index patch [1] we can see
> improvements at lower clients also. Amit's patch has removed a heavy weight
> page lock which was the bottle neck at lower clients.
>
> [1] Concurrent Hash Indexes
>
Hi Mithun,
Can you describe your benchmarking machine? Your benchmarking data went up
to 128 clients. But how many cores does the machine have? Are you testing
how well it can use the resources it has, or how well it can deal with
oversubscription of the resources?
Also, was the file supposed to be named .ods? I didn't find it to be
openable as an .odc file.
Cheers,
Jeff

On 09/05/2016 02:50 PM, Mithun Cy wrote:
On Sep 2, 2016 7:38 PM, "Jesper Pedersen"
wrote:
Could you provide a rebased patch based on Amit's v5 ?
Please find the the patch, based on Amit's V5.
I have fixed following things
1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.
2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.
I think that the
+ pageopaque->hasho_prevblkno = metap->hashm_maxbucket;
trick should be documented in the README, as hashm_maxbucket is defined
as uint32 where as hasho_prevblkno is a BlockNumber.
(All bucket variables should probably use the Bucket definition instead
of uint32).
For the archives, this patch conflicts with the WAL patch [1].
[1]
https://www.postgresql.org/message-id/CAA4eK1JS%2BSiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-%3D0w%40mail.gmail.com
Best regards,
Jesper
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Tue, Sep 6, 2016 at 12:20 AM, Mithun Cy wrote:
> On Sep 2, 2016 7:38 PM, "Jesper Pedersen"
> wrote:
>> Could you provide a rebased patch based on Amit's v5 ?
>
> Please find the the patch, based on Amit's V5.
>
I think you want to say based on patch in the below mail:
https://www.postgresql.org/message-id/CAA4eK1J6b8O4PcEPqRxNYbLVbfToNMJEEm%2Bqn0jZX31-obXrJw%40mail.gmail.com
It is better if we can provide the link for a patch on which the
current patch is based on, that will help people to easily identify
the dependent patches.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Sep 2, 2016 7:38 PM, "Jesper Pedersen"
wrote:
> Could you provide a rebased patch based on Amit's v5 ?
Please find the the patch, based on Amit's V5.
I have fixed following things
1. now in "_hash_first" we check if (opaque->hasho_prevblkno ==
InvalidBlockNumber) to see if bucket is from older version hashindex and
has been upgraded. Since as of now InvalidBlockNumber is one value greater
than maximum value the variable "metap->hashm_maxbucket" can be set (see
_hash_expandtable). We can distinguish it from rest. I tested the upgrade
issue reported by amit. It is fixed now.
2. One case which buckets hasho_prevblkno is used is where we do backward
scan. So now before testing for previous block number I test whether
current page is bucket page if so we end the bucket scan (see changes in
_hash_readprev). On other places where hasho_prevblkno is used it is not
for bucket page, so I have not put any extra check to verify if is a bucket
page.
cache_hash_index_metapage_onAmit_v5_01.patch
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On 07/22/2016 06:02 AM, Mithun Cy wrote:
I have created a patch to cache the meta page of Hash index in
backend-private memory. This is to save reading the meta page buffer every
time when we want to find the bucket page. In “_hash_first” call, we try to
read meta page buffer twice just to make sure bucket is not split after we
found bucket page. With this patch meta page buffer read is not done, if
the bucket is not split after caching the meta page.
Idea is to cache the Meta page data in rd_amcache and store maxbucket
number in hasho_prevblkno of bucket primary page (which will always be NULL
other wise, so reusing it here for this cause!!!). So when we try to do
hash lookup for bucket page if locally cached maxbucket number is greater
than or equal to bucket page's maxbucket number then we can say given
bucket is not split after we have cached the meta page. Hence avoid reading
meta page buffer.
I have attached the benchmark results and perf stats (refer
hash_index_perf_stat_and_benchmarking.odc [sheet 1: perf stats; sheet 2:
Benchmark results). There we can see improvements at higher clients, as
lwlock contentions due to buffer read are more at higher clients. If I
apply the same patch on Amit's concurrent hash index patch [1] we can see
improvements at lower clients also. Amit's patch has removed a heavy weight
page lock which was the bottle neck at lower clients.
Could you provide a rebased patch based on Amit's v5 ?
Best regards,
Jesper
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Thu, Aug 4, 2016 at 3:36 AM, Tom Lane wrote:
> Jeff Janes writes:
>> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy
>> wrote:
>>> I have created a patch to cache the meta page of Hash index in
>>> backend-private memory. This is to save reading the meta page buffer every
>>> time when we want to find the bucket page. In “_hash_first” call, we try to
>>> read meta page buffer twice just to make sure bucket is not split after we
>>> found bucket page. With this patch meta page buffer read is not done, if the
>>> bucket is not split after caching the meta page.
>
> Is this really safe? The metapage caching in btree is all right because
> the algorithm is guaranteed to work even if it starts with a stale idea of
> where the root page is. I do not think the hash code is equally robust
> about stale data in its metapage.
>
I think stale data in metapage could only cause problem if it leads to
a wrong calculation of bucket based on hashkey. I think that
shouldn't happen. It seems to me that the safety comes from the fact
that required fields (lowmask/highmask) to calculate the bucket won't
be changed more than once without splitting the current bucket (which
we are going to scan). Do you see a problem in hashkey to bucket
mapping (_hash_hashkey2bucket), if the lowmask/highmask are changed by
one additional table half or do you have something else in mind?
>
>> What happens on a system which has gone through pg_upgrade?
>
> That being one reason why. It might be okay if we add another hasho_flag
> bit saying that hasho_prevblkno really contains a maxbucket number, and
> then add tests for that bit everyplace that hasho_prevblkno is referenced.
>
Good idea.
- if (retry)
+ if (opaque->hasho_prevblkno <= metap->hashm_maxbucket)
This code seems to be problematic with respect to upgrades, because
hasho_prevblkno will be initialized to 0x without the patch.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jeff Janes writes:
> On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy wrote:
>> I have created a patch to cache the meta page of Hash index in
>> backend-private memory. This is to save reading the meta page buffer every
>> time when we want to find the bucket page. In â_hash_firstâ call, we try
>> to
>> read meta page buffer twice just to make sure bucket is not split after we
>> found bucket page. With this patch meta page buffer read is not done, if the
>> bucket is not split after caching the meta page.
Is this really safe? The metapage caching in btree is all right because
the algorithm is guaranteed to work even if it starts with a stale idea of
where the root page is. I do not think the hash code is equally robust
about stale data in its metapage.
>> Idea is to cache the Meta page data in rd_amcache and store maxbucket number
>> in hasho_prevblkno of bucket primary page (which will always be NULL other
>> wise, so reusing it here for this cause!!!).
> If it is otherwise unused, shouldn't we rename the field to reflect
> what it is now used for?
No, because on other pages that still means what it used to. Nonetheless,
I agree that's a particularly ugly wart, and probably a dangerous one.
> What happens on a system which has gone through pg_upgrade?
That being one reason why. It might be okay if we add another hasho_flag
bit saying that hasho_prevblkno really contains a maxbucket number, and
then add tests for that bit everyplace that hasho_prevblkno is referenced.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

On Fri, Jul 22, 2016 at 3:02 AM, Mithun Cy wrote:
> I have created a patch to cache the meta page of Hash index in
> backend-private memory. This is to save reading the meta page buffer every
> time when we want to find the bucket page. In “_hash_first” call, we try to
> read meta page buffer twice just to make sure bucket is not split after we
> found bucket page. With this patch meta page buffer read is not done, if the
> bucket is not split after caching the meta page.
>
> Idea is to cache the Meta page data in rd_amcache and store maxbucket number
> in hasho_prevblkno of bucket primary page (which will always be NULL other
> wise, so reusing it here for this cause!!!).
If it is otherwise unused, shouldn't we rename the field to reflect
what it is now used for?
What happens on a system which has gone through pg_upgrade? Are we
sure that those on-disk representations will always have
InvalidBlockNumber in that fields? If not, then it seems we can't
support pg_upgrade at all. If so, I don't see a provision for
properly dealing with pages which still have InvalidBlockNumber in
them. Unless I am missing something, the code below will always think
it found the right bucket in such cases, won't it?
if (opaque->hasho_prevblkno <= metap->hashm_maxbucket)
Cheers,
Jeff
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers