Oracle Blog

Michael Corcoran's Weblog

Monday Mar 21, 2011

Just wanted to share a neat trick I just learned using mdb. Quite often you want to set a breakpoint in a hot code path and just want to stop when a certain value is passed in. The following is a way to do so:
[0]> func+4::bp -c "

Monday Jun 13, 2005

Welcome to
OpenSolaris!
In this entry, I'll walk through the basics of page coalescing within Solaris.
I tend to focus on SPARC since that is my main background, but the logic is
actually in common code so it works on both SPARC and x86 architectures. I'll
use SPARC in this post as there are four page sizes (8K, 64K, 512K and 4M)
which are supported on most SPARC chips.

Before getting into the details of page coalescing, some basic concepts need
to be introduced.
In Solaris, a
page_t
is the structure which tracks information for a page
of memory, and the base size of memory which is tracked is 8K on SPARC (4K on
x86). Now for efficiency reasons we may prefer to use larger page sizes as
they may use less hardware resources (I don't have enough time to go into the
intricacies of the hardware and their support for large pages, but different
processors handle large pages differently.) So, a 64K page is actually a
collection of 8K pages for which the first 8K page is aligned on a 64K
physical boundary. Inside the
page_t
this is denoted by the
p_szc
field, which stands for "page size code." For SPARC, p_szc of 0 corresponds
to an 8K page, p_szc of 1 to a 64K page, 2 to 512K and 3 to 4M. The kernel
maintains a freelist to hold pages of each size code and pages are demoted or
promoted depending upon the needs of the system and thus can move between
the different sized freelists.

So, page coalescing is a process used to create large pages from smaller pages.
Over time memory in the system gets fragmented and often most free memory ends
up on the smallest sized freelist. Now if a request for a 4M page comes in
and there are no 4M pages available, the kernel will have to go out and create
a page of this size to use. The first step in this process is called page
coalescing which will try to coalesce adjacent free pages to create a large
free page of the desired size. This is the function of
page_freelist_coalesce.
If this fails, the kernel can get more aggressive
and relocate allocated pages out of a range to create a large free page. I'll
get into this in another blog in the future.

Now we get to the good stuff which
OpenSolaris
has made that much easier to share; how the code works.

Let's look at
page_freelist_coalesce
via the source browser and you'll find that the function is quite macro heavy.
There are a few things to note here about the macros:

PAGE_COUNTERS_\*
macros manipulate the page_counters data structure which
contains all of the info we need to coalesce free pages into larger free pages.
Check out
vm_pagelist.c.
for the definition of this structure as well
as comments on the basics of what the stored data represents. The comment in the code here gives a good description of the data structure which you'll need to read through to understand the rest of the blog.

PGCTRS_CANDS_\*.
macros manipulate the page_ctrs_cands data structure which
is an optimization on top of the page_counters which essentially keeps track
of popcounts for given regions of memory. We'll see how this optimization
works shortly.

So, after reading the code and comments above, lets look at the function
page_freecnt
to get a better understanding of what the data looks like and
how it is used. page_freecnt returns the number of free base pagesize (8K for
SPARC) pages which exist at page pp for the given szc.

So, let's say that we want to see how many free pages there are within a
specific 4M range of memory. We call page_freecnt, and take a look in the
page_counters data structures to figure out how many free pages there are.
For this example, we'll give only the data which is relevant to that single
4M range of memory, and I'll use some very bad ascii art to help explain:

In the above picture, we have Regions 3, 2, 1 and 0 which
correspond to the array indexes in the page_counters array:

255 static hw_page_map_t \*page_counters[MMU_PAGE_SIZES];

Now the confusing part :) The above represents 4 512K pages being free,
19 64K pages being free, and 23 8K pages being free for a total of
(4\*64)+(19\*8)+23 = 431 8K pages which will be free.

Now lets see how we will calculate this information:

So, in the art above, assume we wanted to create a 4M page which corresponded
to the page_counters bin which currently has a 6 in it. Here's how the logic
will go:

1. We know there are 6 512K pages which either already exist or can be created
from smaller pages. Thus, cnt starts at (6\*64) as it takes 64 8K pages to make
a 512K page. So, cnt is currently 384.

2. Now, going to the next level, we see 2 8's. We skip over these as these were
already counted in the step above. We don't care what their current size is,
all we care is that they can be used to create a 512K page. In addition, we
see a 5 at this level, so since it takes 8 8K pages to make a 64K page, we
need to add (5\*8), to cnt so cnt is now 424.

3. Now we need to go to the next level, and we see 2 8's again which will
have contributed to the 64K page count we calculated above. So, we then look
at the 7 which represents 7 8K pages which are free, so we need to add this
to cnt and get a grand total of 431 which matches what is actually out
there in terms of the real size of the pages.

Monday May 30, 2005

I've finally decided to jump on the blogging bandwagon. My name's Michael Corcoran and I've been working at Sun for over six years now. When I first joined Sun, I was working in one of the support organizations for the high end server group. After fixing many bugs in various areas I moved on to join the performance group focusing on high end systems which brings me to my general performance job of today. I mainly work with the virtual memory subsystem and love looking at core files to track down problems. This intro is just to get me started and used to blogging. In the future, I'd like to talk about core file analysis as well as some of the VM features of Solaris.