From dev-return-44560-apmail-directory-dev-archive=directory.apache.org@directory.apache.org Tue Dec 3 06:49:24 2013
Return-Path:
X-Original-To: apmail-directory-dev-archive@www.apache.org
Delivered-To: apmail-directory-dev-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 43E4D104D4
for ; Tue, 3 Dec 2013 06:49:24 +0000 (UTC)
Received: (qmail 88235 invoked by uid 500); 3 Dec 2013 06:49:23 -0000
Delivered-To: apmail-directory-dev-archive@directory.apache.org
Received: (qmail 88038 invoked by uid 500); 3 Dec 2013 06:49:19 -0000
Mailing-List: contact dev-help@directory.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: "Apache Directory Developers List"
Delivered-To: mailing list dev@directory.apache.org
Received: (qmail 88005 invoked by uid 99); 3 Dec 2013 06:49:17 -0000
Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2013 06:49:17 +0000
X-ASF-Spam-Status: No, hits=-0.7 required=5.0
tests=RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (nike.apache.org: domain of elecharny@gmail.com designates 74.125.82.170 as permitted sender)
Received: from [74.125.82.170] (HELO mail-we0-f170.google.com) (74.125.82.170)
by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2013 06:49:10 +0000
Received: by mail-we0-f170.google.com with SMTP id w61so13213897wes.15
for ; Mon, 02 Dec 2013 22:48:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20120113;
h=message-id:date:from:user-agent:mime-version:to:subject
:content-type:content-transfer-encoding;
bh=v3Yufnx3ns7IHb43RA7G8KVV+HulTnJaONBeR4ty7ZE=;
b=SL5j675aUXAk5Jwt7QdJzUuzitITQxejkUI2P0ZX4Pnmf5rg4zu9VXdVKm+g5H3mjC
uzwhuvmXc2HQHWfKkVoFJFcEg93wh/2esizRl2Hm9erVVNOI0Uu73aZ5+r1k3+CBilyM
uo4igl7gdSordU4BVfxtEk9DlfOw5DevYcuUM5tOYGlIHlis6Rykw3paZqqixRUDnZjN
IcsNpe8DevijovfMTcW8cVXcZ6pyADp4g2qb5px65WkezWDN+Bj4/VLTT+ADewRgj7yZ
T9kIOjd2kaotrIITg6w7I8ZU2UIBuLs8o0l/SH8LvywoCe6xexDq4px6Smdh8Ei6qiae
hybg==
X-Received: by 10.194.201.225 with SMTP id kd1mr39993702wjc.35.1386053329985;
Mon, 02 Dec 2013 22:48:49 -0800 (PST)
Received: from new-host-3.home (AMontsouris-651-1-47-211.w82-123.abo.wanadoo.fr. [82.123.242.211])
by mx.google.com with ESMTPSA id f11sm2493484wic.4.2013.12.02.22.48.47
for
(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
Mon, 02 Dec 2013 22:48:48 -0800 (PST)
Message-ID: <529D7ECE.6020302@gmail.com>
Date: Tue, 03 Dec 2013 07:48:46 +0100
From: =?UTF-8?B?RW1tYW51ZWwgTMOpY2hhcm55?=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:24.0) Gecko/20100101 Thunderbird/24.1.1
MIME-Version: 1.0
To: Apache Directory Developers List
Subject: Cache : there is some room for improvement...
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Virus-Checked: Checked by ClamAV on apache.org
Hi !
last numbers I got are quite interesting, now that we are corectly
leveraging the cache (alias cache, ParentIdAndRdn cache aka PIAR cache,
entry cache). Still, the way we configure and initialize the cache is
far from being perfect. I'll summarize some findings I gathered during
those last weeks here.
1) Cache is critical to performances.
When we process a search, there are many areas where we access the
backend (be it JDBM or Mavibot) and we would gain for not doing so. By
adding a cache for Aliases and ParentIdAndRdn, I was able to get a 25%
speed improvement (assuming the cache is hit everytime). The very same
for the entry cache : having all the entries loaded into the cache is a
major factor of speed.
So we need a big entry, aliases and ParentIdAndRdn cache, that's for sure.
2) The cache configuration is not perfect.
I discovered that the entry cache was initialized with a value of 1
entry being cached... Obviously, it's a bit tight. But the pb is that
whatever configuration you set, it won't change !
So I fixed that (the ugly way).
The real problem is that the cache configuration and initialization is a
mess... We use a CacheService class (good thing !) which is not
initialized in some tests, so I had to check if the cache is not null
before using it in many parts of the code. This has to be fixed. The
various caches (aliases, entry, PIAR aren't all initialized into to
AbstractBTreePartition, for instance).
We also have various cache configurations :
- partition cache
- index cache
This is not clear what parameter is used for which cache. We have to get
this fixed.
3) Backend cache and ApacheDS cache
The backend cache and teh ADS cache are two different things. In
Mavibot, we cache Pages. In JDBM, we also cache Pages. In ADS, we cache
entries, aliases, etc. Atm, the configuration makes it not clear which
cache is being set (although the index cacheSize parameter is only used
to set the backend cache size).
The thing is that in JDBM, each single index can have its own cache,
when the cache is global in Mavibot. In other words, we can't really
assume that configuring the backend cache is something generic.
Otherwise, we are using EhCache, and a dedicated configuration file for
it. It would be good not to have to manipulate this file at all, and
have the cache configuration all in ADS config.
Well, there is some room for improvement in this area
4) Which cache should we favor ?
Backend page cache is useless if the ADS cache are loaded, except if we
are using indexes. That means we need both. The thing is that what is
expensive when brosing a BTree is not only to fetch pages from the disk,
but also to deserialize them. It would be good to keep the index pages
in memory (as we don't have any cache at the ADS level for indexes) and
not to cache the MasterTable (as we have an EntryCache) nor the RdnIndex
(for the same reason : we already have the PIAR index). This requires
some information to be propagated to the backend cache (do *not* cache
this BTree, do cache this one...).
There is room for improvement here.
5) What if we have enough memory ?
90% of the raw search time is caused by the entry cloning. We *have* to
avoid cloning the entry if we want to get better performances. This is
what we should work on.
Regardless, if we don't have enough memory, at the end of the day, the
server will hit the disk and we will get way lower performances (by at
least one order of magnitude). This is something to keep in mind when
doing perf tests : we are NOT testing the disk performance, we are
testing the server performance. Running a benchmark when there is not
enough memory to have cache loaded is a waste of time, as the impact of
disk reads is so huge it will hide any improvement we can make on the
server.
Soooo : we need enough memory to run the server ! The pb is : how much
memory do we need ? This is the tricky part...
--
Regards,
Cordialement,
Emmanuel Lécharny
www.iktek.com