Clean up dns prefetch code, and also port it.
- remove slave threads and use HostResolver in asynchronous mode instead (while still limiting number of concurrent lookups)
- make the implementation portable and make DnsMaster unit test compile and pass on Linux
- add more tests to DnsMaster unit test to simulate various shutdown scenarios, concurrent lookups, and to verify that we don't exceed our limit of concurrent lookup requests)
- remove some tests which relied on specifics of slaves' inner working
- adjust initialization and shutdown of dns prefetching (now it relies on the IO message loop being present)
Bonus: shutdown is almost instant now, no need to have a timeout.
BUG=5687, 6683

There are some bad things in this patch I don't quite know what do to with.
Example:
- linking stub for l10n_util. We can remove it and not link dns_master_unittest
on Linux until it can be nicely ported.
- ifdefs in dns_global: as above
- we don't have a timeout waiting for dns slaves to exit; I don't know how to do
it portably; we probably need to extend PlatformThread or something similar.
But the good news is, dns_master_unittest passes on Linux with this patch.

Dean McNamee

Yeah, ok, this patch is too hacky/slashy, and too much at once to review. Could ...

Yeah, ok, this patch is too hacky/slashy, and too much at once to review. Could
you break this review into two:
1) The small obvious changes, 1 -> 1U, int -> size_t, std::string& ->
std::string. This will be an easy review, and then you can get that committed,
make sure it works on Windows, etc.
2) The ugly stuff, hopefully also in smaller pieces (perhaps 2 reviews). There
will probably be a lot of discussion on 2, and there is no point in blocking 1
on it.
I have a hard time reviewing things when it's a mix of simple obvious changes
and I want to say LGTM, but then there are a whole bunch of big changes also.
Thanks
-- dean
http://codereview.chromium.org/15076/diff/1/6
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/1/6#newcode422
Line 422: if (!PlatformThread::Create(stack_size, slave_instance,
&thread_handles_[slave_count_]))
80 col
http://codereview.chromium.org/15076/diff/1/8
File chrome/browser/net/dns_master.h (right):
http://codereview.chromium.org/15076/diff/1/8#newcode42
Line 42: static const unsigned int kSlaveCountMax = 8;
we don't use unsigned like this. size_t maybe?

Evan Martin

+agl (who's been doing some hack'n'slash as well, where Dean's comments might help)

For my change to make render_process_host build, I also had to remove an include
of browser.h.
I wonder if we could just wrap the *headers* that haven't been ported with an
"#if !defined(OS_LINUX)"? Maybe that's too easy to get confused about, thoguh.

On 2008/12/19 10:58:21, Dean McNamee wrote:
> Yeah, ok, this patch is too hacky/slashy, and too much at once to review.
Could
> you break this review into two:
>
> 2) The ugly stuff, hopefully also in smaller pieces (perhaps 2 reviews).
There
> will probably be a lot of discussion on 2, and there is no point in blocking 1
> on it.
This is the rest of stuff. I think the good way to split it is to have two
patches:
- threading changes (switching to PlatformThread or some other portable thing)
- switchable DNS functions refactoring
What do you think?

Evan Martin

Just making sure this patch isn't lost -- I think Dean remains the right person ...

This looks mostly OK to me, but this is Jar's code so you're going to have to
get a review from him.
Darin, there was once a promise that base::Thread would be the only consumer of
PlatformThread. There are now tons of code using PlatformThread directly :(.
Should / could the DNS threads be message loop oriented? We had agreed on this
frequently ignored banner in platform_thread.h.
// WARNING: You should *NOT* be using this class directly. PlatformThread is
// the low-level platform-specific abstraction to the OS's threading interface.
// You should instead be using a message-loop driven Thread, see thread.h.
http://codereview.chromium.org/15076/diff/601/802
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/601/802#newcode460
Line 460: thread_handles_[slave_count_] = 0;
From platform_thread.h
// PlatformThreadHandle should not be assumed to be a numeric type, since the
// standard intends to allow pthread_t to be a structure. This means you
// should not initialize it to a value, like 0.
http://codereview.chromium.org/15076/diff/601/801
File chrome/browser/net/dns_master_unittest.cc (right):
http://codereview.chromium.org/15076/diff/601/801#newcode38
Line 38: // Provide network function stub to run tests offline (and avoid the
variance
Provide _a_ network function stub, or leave it as stubs.

http://codereview.chromium.org/15076/diff/601/802
File chrome/browser/net/dns_master.cc (left):
http://codereview.chromium.org/15076/diff/601/802#oldcode437
Line 437: ResumeThread(handle); // WINAPI call.
I don't see the equivalent of this in the port. You used a CREATED_SUSPENDED
flag above.... but I don't see where you start up the thread.
...on the other hand... if the ported call actually starts the thread
immediately....
IMO, you should not start the thread until you've completed the initialization
(re: putting pointer to instance into slaves_[] and bumping slave_count_).
Maybe it would be OK, but it would require a lot more code review to verify that
this semantic change was safe.
http://codereview.chromium.org/15076/diff/601/802
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/601/802#newcode455
Line 455: PlatformThread::Join(thread_handles_[i]);
This join may take a long time in general. The problem is that a DNS resolution
may be pending... and the thread can't realize it should terminate until that is
complete. DNS resolutions can sometimes take in the range of 8 seconds (or
longer :-/). I like the idea of waiting for the threads to shut down... but you
can see in the original code I actually had a timer that would "leak" threads
as needed to shut down "fast enough."
Is there a reasonable way to preserve the semantics to "shut down within N
seconds?"
http://codereview.chromium.org/15076/diff/601/802#newcode460
Line 460: thread_handles_[slave_count_] = 0;
There should be some way (perhaps another member array?) provided to test for
shut down if you continue to provide the above time-out fallback for the join.
This code is being "extra careful" in its cleanup by zeroing out these fields...
but it is not semanticly needed, so the assignment can certainly be removed.
On the other hand, if there is any requisite destructor for these
"thread_handles_" it should be called.
If you have don't use the join semantics (and allow timeouts), then you'll need
some way to cleanup only the threads that terminated "in time." If you only
have the join (with arbitrary wait time), then just removing this "extra
cleanup" would suffice.
On 2009/01/06 12:19:23, Dean McNamee wrote:
> From platform_thread.h
>
> // PlatformThreadHandle should not be assumed to be a numeric type, since the
> // standard intends to allow pthread_t to be a structure. This means you
> // should not initialize it to a value, like 0.
>

Paweł Hajdan Jr.

On 2009/01/06 18:37:39, jar wrote: > ...on the other hand... if the ported call actually ...

On 2009/01/06 18:37:39, jar wrote:
> ...on the other hand... if the ported call actually starts the thread
> immediately....
That's what happens.
> IMO, you should not start the thread until you've completed the initialization
> (re: putting pointer to instance into slaves_[] and bumping slave_count_).
> Maybe it would be OK, but it would require a lot more code review to verify
that
> this semantic change was safe.
It seems to me it's ok, but of course, a more detailed review would be needed. I
think that MessageLoop may help here, but I don't understand all of its
properties.
> http://codereview.chromium.org/15076/diff/601/802
> File chrome/browser/net/dns_master.cc (right):
>
> http://codereview.chromium.org/15076/diff/601/802#newcode455
> Line 455: PlatformThread::Join(thread_handles_[i]);
> This join may take a long time in general. The problem is that a DNS
resolution
> may be pending... and the thread can't realize it should terminate until that
is
> complete. DNS resolutions can sometimes take in the range of 8 seconds (or
> longer :-/). I like the idea of waiting for the threads to shut down... but
you
> can see in the original code I actually had a timer that would "leak" threads
> as needed to shut down "fast enough."
>
> Is there a reasonable way to preserve the semantics to "shut down within N
> seconds?"
I couldn't find any with current PlatformThread. Maybe the solution with
MessageLoop allows that, or the PlatformThread can be extended to include the
timeout. Yeah, I was also worried about this shutdown issue.
> http://codereview.chromium.org/15076/diff/601/802#newcode460
> Line 460: thread_handles_[slave_count_] = 0;
> There should be some way (perhaps another member array?) provided to test for
> shut down if you continue to provide the above time-out fallback for the join.
>
> This code is being "extra careful" in its cleanup by zeroing out these
fields...
> but it is not semanticly needed, so the assignment can certainly be removed.
>
> On the other hand, if there is any requisite destructor for these
> "thread_handles_" it should be called.
For PlatformThread, it's Join.

Paweł Hajdan Jr.

ping. Darin, do you have some ideas about MessageLoop and the discussion above?

On 2009/01/12 18:31:50, darin wrote:
> I'm not sure I understood the question about MessageLoops...
Sorry, I will provide some summary of discussion so far:
- There are many limitations of this patch's approach, like not having a timeout
on shutdown.
- The code also uses PlatformThread instead of WinAPI calls to be portable, but
that is probably wrong because we have Thread with MessageLoop and comments in
PlatformThread's .h file say it should not be used directly:
// WARNING: You should *NOT* be using this class directly. PlatformThread is
// the low-level platform-specific abstraction to the OS's threading interface.
// You should instead be using a message-loop driven Thread, see thread.h.
- I thought that maybe MessageLoop would provide solution to the absent timeout
problem.
But I don't say we should necessarily use MessageLoops. The main problem is how
to generally portably spawn the threads, give them work, and shutdown them with
timeout.

darin (slow to review)

OK, I understand now. Taking a step back, I think the right answer is to ...

OK, I understand now. Taking a step back, I think the right answer is to change
this code to be based on HostResolver (from net/base/) instead. You could just
leak the HostResolver objects at shutdown.
You could also use WorkerPool, which is what HostResolver does, in case you
really need background threads.

jar (doing other things)

Sorry I didn't send this sooner. I wrote it up over a week ago... but ...

Sorry I didn't send this sooner. I wrote it up over a week ago... but never
"published."
Jim
http://codereview.chromium.org/15076/diff/601/802
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/601/802#newcode419
Line 419: slave_count_++;
OK... I read your comment indicating that the thread is automatically started.
...and then I tried to look at the semantic safety of doing these
initializations after the thread has started.
I think you'll be safe because this method is always called in the context of
the global master DNS lock being held (hence its name, PreLocked...())
As a result, none of the other threads (slaves, or arbitrary calls to master)
will be able to access these protected variables (slave_count_ and slaves[])
until you release the lock. ...so I'm pretty sure you're safe.
Message loop does not play a role in this, as master may be called by any/all
MessageLoop threads concurrently.

On 2009/01/12 18:50:51, darin wrote:
> OK, I understand now. Taking a step back, I think the right answer is to
change
> this code to be based on HostResolver (from net/base/) instead.
Thanks. I used HostResolver and Thread, and the unit tests pass on Linux.
Problems I could see and don't quite know what to do with:
- I tried un-disabling tests in dns_master_unittests.cc to see if they pass.
Some of them don't, and I added TODOS in code. Not to actually commit that, but
to make sure that the info doesn't get lost. These tests seem to assume that
only one slave thread will be spawned. I don't know why.
- Sometimes the first test (OsCachesLookup...) doesn't pass. I don't like this
flakiness. What do you suggest?
- We do a lot more heap allocation/deallocation of small objects in this code
than it was before. Can it have negative effects on performance (memory
fragmentation)?

darin (slow to review)

I'm a little confused by the usage of Thread and HostResolver. You can use HostResolver ...

I'm a little confused by the usage of Thread and HostResolver. You can use
HostResolver either in synchronous mode or asynchronous mode. When you use it
in asynchronous mode, it uses a thread pool to call getaddrinfo. When you use
it in synchronous mode, it just calls getaddrinfo on the current thread. To
select synchronous mode, you just pass null as the HostResolver::Resolve's
callback param.
Now, in this patch, it appears that you are using Thread to create a thread
pool, and you are using HostResolver in asynchronous mode. It seems like you
should either switch to synchronous mode HostResolver or ditch the custom thread
pool.

http://codereview.chromium.org/15076/diff/1022/1024
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/1022/1024#newcode63
Line 63: AutoLock auto_lock(lock_);
All the results_[] contents could be asynchronously accessed by the resolving
thread (dns_slave service threads), which may be delayed, and we could probably
crash if we erased the structure while we were trying to access it.
The slave_threads only access this data under the protection of the lock, and so
those lookups (here) should be done under protection of the lock.
On 2009/01/20 21:09:13, darin wrote:
> i wonder if we actually need this lock_ member anymore.

Paweł Hajdan Jr.

This is the version with no slaves, and just async HostResolvers. It limits the number ...

This is the version with no slaves, and just async HostResolvers. It limits the
number of concurrent lookups (currently to 8).
Things remaining to do (I'll do them):
- test on Windows
- remove dns_slave.* from vcproj
- adjust other code using dns_master (I changed the constructor signature)
I haven't done this yet because there might be further changes to the used
approach.
http://codereview.chromium.org/15076/diff/1022/1024
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/1022/1024#newcode22
Line 22: class DnsMaster::LookupCallback : public net::CompletionCallback,
On 2009/01/20 21:09:13, darin wrote:
> the more canonical way to use CompletionCallback is to have a member variable
of
> type CompletionCallbackImpl<LookupCallback>. see
> net/http/http_network_transaction.{h,cc} for example.
I don't see a way to do something similar here. We'll have more than one
callback, possible up to kMaxConcurrentLookups. In theory I could keep track of
them in DnsMaster, but it would complicate things and I don't see benefit of it.
http://codereview.chromium.org/15076/diff/1022/1024#newcode63
Line 63: AutoLock auto_lock(lock_);
On 2009/01/20 21:09:13, darin wrote:
> i wonder if we actually need this lock_ member anymore.
I could do some testing with base/thread_collision_warner. In DnsMaster itself
there will be only one thread accessing its data (callbacks run on the same
thread). Everything depends on what else uses the master.

Dean McNamee

Just some minor style nits. I will leave the real review up to darin/jar. Btw, ...

> > the more canonical way to use CompletionCallback is to have a member
variable
> of
> > type CompletionCallbackImpl<LookupCallback>. see
> > net/http/http_network_transaction.{h,cc} for example.
>
> I don't see a way to do something similar here. We'll have more than one
> callback, possible up to kMaxConcurrentLookups. In theory I could keep track
of
> them in DnsMaster, but it would complicate things and I don't see benefit of
it.
What I had in mind was that you would still have the LookupCallback class, but
that you could give that class a CompletionCallbackImpl<LookupCallback> member
variable. The end result is basically the same. Instead of inheriting a
virtual function, you have a member object that implements the virtual function.
The member object then calls a non-virtual member function of the
LookupCallback class. This pattern is a bit more readable because the
non-virtual member function can be named however you like it and it gets a
single "int" parameter.

darin (slow to review)

Taking both of my suggestion together, you might just end up with a Lookup class ...

Taking both of my suggestion together, you might just end up with a Lookup class
that has a HostResolver. The Lookup class would have its own Start method that
takes the hostname to resolve. Then the HostResolver usage would be completely
encapsulated within the Lookup class.

Paweł Hajdan Jr.

Dean: Thanks for your comments. I hopefully fixed the nits. I also tweaked my .emacs ...

Dean: Thanks for your comments. I hopefully fixed the nits. I also tweaked my
.emacs to point out some of these issues.
Darin: After applying your ideas the code is a bit simpler. Thanks.
I also updated other files which are using DnsMaster and removed deleted files
from vcproj.
What do you think now? Do you have some ideas about additional tests? I'm
sending this patch to the trybot now.

darin (slow to review)

http://codereview.chromium.org/15076/diff/1451/1455 File chrome/browser/net/dns_master.cc (right): http://codereview.chromium.org/15076/diff/1451/1455#newcode39 Line 39: if (revoked()) even when revoked, we probably still ...

Here are a pile of comments. One of the big ones discusses how to go about
removing the locks completely. I'm pretty convinced that is very doable.
Please put me on the review list for any final CL.
Thanks,
Jim
http://codereview.chromium.org/15076/diff/1022/1024
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/1022/1024#newcode32
Line 32: LOG(INFO) << "callback created for " << hostname;
These log calls (plus the ones below) should probably be DLOG, or else they will
swamp the logging output. Perchance they are even just temporary for debugging?
http://codereview.chromium.org/15076/diff/1022/1024#newcode39
Line 39: return;
Is the plan to leak this instance during shutdown as needed? Makes sense to
me... but perhaps you could add a comment.
http://codereview.chromium.org/15076/diff/1022/1024#newcode48
Line 48: delete addresses_;
Perchance this is an example of the small objects you were talking about (re:
fear of multiple allocations and deallocations)? I'd agree that it is a waste
to create these and then destroy them. If indeed addresses are plural, then
you're not even sure how many you'll be getting, and the size of the object may
be worse.
It would be very nice if the resolver had an interface such that if you passed
in a NULL pointer, then the addresses would not be retured... and there would be
less need to alloc/free.
http://codereview.chromium.org/15076/diff/1022/1024#newcode54
Line 54: net::AddressList* addresses_;
Perhaps the above two items could be made scoped_pointers, so that the deletion
will not need to be explicit in the above RunWithParams() method.
http://codereview.chromium.org/15076/diff/1022/1024#newcode63
Line 63: AutoLock auto_lock(lock_);
Darin suggested that you wanted all this to run on one thread, and hence
wouldn't need locking. Can you make such an assertion clear in the header
comments if that is the case?
If so, you'll certainly get away with running without internal locks, and you'll
get to instead depend on the message loop to serialize accesses.
If you go this route... you need to synchronize the following calls:
a) UI thread calls during omnibox typing. That should get onto your thread via
a posted task.
b) Renderer calls which currently arrive via the IPC thread. They are generally
calls with a vector of lookup, and can easily be synchronized via a posted task.
These include both page scans, and mouse-over notifications (requesting DNS
resolutions).
c) Network observer calls: These are a little tricky. They were originally used
to facilitate measurement of the value of this feature (i.e., to time actual
latency costs of DNS resolutions during real user navigations). Now those calls
are also the basis of the adaptive prefetching, and the value (re: latency
reduction) is the basis for keeping/discarding the "learned" adaptive
prefetches. As a result, you really need this hook. The slight problem is that
that the hook gets called twice asyncronously, and you need to correlate those
calls (start resolve vs resolve finished) to know what duration corresponds to
what name. That would be a hassle to do via message passing (I was using the
handle as an opaque pointer, stored in a map)... but... Darin has informed me
that those asynchronous calls *all* come back on the IO thread. So.... if you
ran the this whole DNS controller on the IO thread, you'd be able to maintain
the correlation easily, and again, not need locks.
Also of course, the callbacks from the resolver need to be synchonized with this
thread, but there again message passing should do the trick.
I'm (in my heart) still a little nervous about the overhead latency of the
message passing. Measurements using histograms have shown that although the
queing delay has a long slow tail, 99% complete in under 1ms... so I probably
shouldn't be so fearful. We do need to carefully measure the re-write to see
that nothing significant has changed in terms of performance.
On 2009/01/20 21:19:46, jar wrote:
> All the results_[] contents could be asynchronously accessed by the resolving
> thread (dns_slave service threads), which may be delayed, and we could
probably
> crash if we erased the structure while we were trying to access it.
>
> The slave_threads only access this data under the protection of the lock, and
so
> those lookups (here) should be done under protection of the lock.
>
> On 2009/01/20 21:09:13, darin wrote:
> > i wonder if we actually need this lock_ member anymore.
>
>
http://codereview.chromium.org/15076/diff/1022/1024#newcode374
Line 374: const std::string hostname(name_buffer_.front());
Can you move these constructors down to just before use? I'd rather see the
SetAssignedState() called sooner, so that we get timing data which includes the
construction time (even if we have a hard time including destruction time).
http://codereview.chromium.org/15076/diff/1022/1024#newcode401
Line 401: PreLockedScheduleLookups();
It might be nicer to call the PreLockedScheduleLookups after we've really
terminated this thread (resolver?).
Perhaps it could appear around where you do the delete in RunWithParams... and
then possibly (I'm not yet sure how these objecets are used) you could get away
with object re-use, rather than a construction of a new set followed immediately
by a destruction of an old set.
http://codereview.chromium.org/15076/diff/1022/1030
File net/base/host_resolver.h (right):
http://codereview.chromium.org/15076/diff/1022/1030#newcode45
Line 45: // could not be completed synchronously, in which case the result code
will
I didn't understand this comment. IS there a typo?
Is is really an error when we ask to run Asnchronously but we can't run
synchrnously? (which isn't what we asked for?)

jar (doing other things)

Nice direction! I look forward to seeing if you can complete the transition to using ...

Nice direction! I look forward to seeing if you can complete the transition to
using PostTask, and not significantly reduce the speed (effectiveness) of the
feature. I'm convinced it can be done with the new network stack.
http://codereview.chromium.org/15076/diff/1480/1487
File chrome/browser/net/dns_master.cc (right):
http://codereview.chromium.org/15076/diff/1480/1487#newcode369
Line 369: pending_lookups_++;
Should there be an else clause that deletes request when we can't start it? If
you want to be robust in the face of this error, you may as well cleanup (I
think). Alternatively, you could just change the DCHECK to CHECK and get rid
of the if.
http://codereview.chromium.org/15076/diff/1480/1487#newcode383
Line 383: PreLockedScheduleLookups();
nit: I think these two lines (as well as the corresponding lines in
SetNoSuchNameState() should actually effectively appear in one place at the end
of OnLookupFinished. It produces less code, and makes more sense that when we
"delete this" (the lookup) that we diminish the number of pending_lookups_ in
master, and try to schedule another lookup. It might even go in the destructor.
...but since pending_lookups_ is really a member of master_, there should
probably be a method on master_ that resolvers can call to say they are now
actively resolving, or they're done (so the master should create a new one). I
noticed you passed the info to master_ to increment pending_lookups_ via a
return value, and that seemed reasonable (but again an access method would by
symmetrical).
That sort of approach might some day allow
http://codereview.chromium.org/15076/diff/1480/1488
File chrome/browser/net/dns_master.h (right):
http://codereview.chromium.org/15076/diff/1480/1488#newcode125
Line 125: // all places where it's currently used, or not at all.
I think it is currently needed... so comment should be amended to indicate that
it is possible to restructure to run only on the IO thread, and remove this
lock_, but then all APIs should be called (from other threads) via PostTask().
http://codereview.chromium.org/15076/diff/1480/1489
File chrome/browser/net/dns_master_unittest.cc (left):
http://codereview.chromium.org/15076/diff/1480/1489#oldcode341
Line 341: TEST(DnsMasterTest, DISABLED_MultiThreadedSpeedupTest) {
Can you create any test that will show that you are limiting the number of
simultaneous lookups? Can you prove that you're doing several in parallel?
http://codereview.chromium.org/15076/diff/1480/1489
File chrome/browser/net/dns_master_unittest.cc (right):
http://codereview.chromium.org/15076/diff/1480/1489#newcode138
Line 138: if (lookups_with_improvement > (all_lookups / 2))
We used to be definite (5 out of 5), but you said that was too flaky. Going
for a 50-50 win will work with almost no correlation... it goes up and down..
you'll always win, and test nothing :-(. IMO, you should get at least to a 90%
win, with greater than 5 tries. The test is then (using cross-multiplication):
lookups_with_improvement * 10 > all_lookups * 9
http://codereview.chromium.org/15076/diff/1480/1489#newcode141
Line 141: EXPECT_GT(lookups_with_improvement, all_lookups / 2);
Same comparison change goes here. You also need a comment that "run forever" is
a failure.

On 2009/01/22 01:03:00, jar wrote:
> http://codereview.chromium.org/15076/diff/1480/1487
> File chrome/browser/net/dns_master.cc (right):
>
> http://codereview.chromium.org/15076/diff/1480/1487#newcode369
> Line 369: pending_lookups_++;
> Should there be an else clause that deletes request when we can't start it?
If
> you want to be robust in the face of this error, you may as well cleanup (I
> think). Alternatively, you could just change the DCHECK to CHECK and get rid
> of the if.
Fixed.
> http://codereview.chromium.org/15076/diff/1480/1487#newcode383
> Line 383: PreLockedScheduleLookups();
> nit: I think these two lines (as well as the corresponding lines in
> SetNoSuchNameState() should actually effectively appear in one place at the
end
> of OnLookupFinished. It produces less code, and makes more sense that when we
> "delete this" (the lookup) that we diminish the number of pending_lookups_ in
> master, and try to schedule another lookup. It might even go in the
destructor.
>
> ...but since pending_lookups_ is really a member of master_, there should
> probably be a method on master_ that resolvers can call to say they are now
> actively resolving, or they're done (so the master should create a new one).
I
> noticed you passed the info to master_ to increment pending_lookups_ via a
> return value, and that seemed reasonable (but again an access method would by
> symmetrical).
>
> That sort of approach might some day allow
I would like to decrement pending_lookups where it's incremented - DnsMaster. I
could add one more member to DnsMaster which would be called by LookupRequest,
but maybe a better way would be to unify the existing Set*State function into
one OnLookupFinished(hostname, success) member of DnsMaster. What do you think?
> http://codereview.chromium.org/15076/diff/1480/1488
> File chrome/browser/net/dns_master.h (right):
>
> http://codereview.chromium.org/15076/diff/1480/1488#newcode125
> Line 125: // all places where it's currently used, or not at all.
> I think it is currently needed... so comment should be amended to indicate
that
> it is possible to restructure to run only on the IO thread, and remove this
> lock_, but then all APIs should be called (from other threads) via PostTask().
Fixed.
> http://codereview.chromium.org/15076/diff/1480/1489
> File chrome/browser/net/dns_master_unittest.cc (left):
>
> http://codereview.chromium.org/15076/diff/1480/1489#oldcode341
> Line 341: TEST(DnsMasterTest, DISABLED_MultiThreadedSpeedupTest) {
> Can you create any test that will show that you are limiting the number of
> simultaneous lookups? Can you prove that you're doing several in parallel?
I added few nice tests that should cover these cases. I also cleaned up
testing-only methods in dns_master.h to be private and used FRIEND_TEST.
> http://codereview.chromium.org/15076/diff/1480/1489
> File chrome/browser/net/dns_master_unittest.cc (right):
>
> http://codereview.chromium.org/15076/diff/1480/1489#newcode138
> Line 138: if (lookups_with_improvement > (all_lookups / 2))
> We used to be definite (5 out of 5), but you said that was too flaky. Going
> for a 50-50 win will work with almost no correlation... it goes up and down..
> you'll always win, and test nothing :-(. IMO, you should get at least to a
90%
> win, with greater than 5 tries. The test is then (using
cross-multiplication):
> lookups_with_improvement * 10 > all_lookups * 9
Indeed. I fixed the test, but had to do few not-so-nice things to make it
non-flaky on Linux. 90% success rate is achievable, but this patch sets it to
75% so the test runs faster. Generally it looks like Linux caches positive dns
request from the start, but only caches negative dns requests after some point.
Do you have some ideas how to imrove this test?

Switched to CancelableRequest. It made earlier crashes also appear consistently
on Linux, which I fixed - see http://codereview.chromium.org/19534 . So this
revision doesn't crash on Linux and my Windows machine, and the trybot.
The problem is that it leaks DnsMaster's callback when shut down when a lookup
is pending. I tried to fix, but then it crashed again. I'll try to discuss it on
irc. Just uploading now to show the status.

Paweł Hajdan Jr.

This is code after simplification we talked about on irc. No leaks, and passes trybot ...

On 2009/02/03 12:51:09, Dean McNamee wrote:
> I'm going to leave this to Darin/Jim. I looked over it briefly and it seems
> good, but I'm not familiar with the DNS system.
In my recent quest for better commit messages, it might be a good idea to expand
upon the current one a bit:
Clean up dns prefetch code, and also port it.
BUG=5687, 6683
This doesn't really tell us to much. There are some bigger changes here, and we
should try to detail more specifically about what has changed. Then instead of
"also port it", mention that it now builds under gcc/linux/mac whatever and
passes X tests.
Also, I believe our bug syntax probably doesn't have spaces around the ,
BUG=1,2,3.
Thanks

Paweł Hajdan Jr.

Dean: thanks for good suggestions. I updated the description. I tested the BUG syntax in ...

I started changing the init/shutdown and PostTask things we discussed on irc
(this is not finished yet; I have some questions).
Now init/shutdown happens in ResourceDispatcherHost and some of functions in
dns_global.cc use PostTask to execute dns_master code in IO thread.
Please look if the way I PostTask looks correct, and if it can be improved.
I'm also not sure how to handle functions in dns_global.cc which return values
(possible through pointer parameters). I thought about posting a task that would
signal a WaitableEvent and wait for that event in calling function. But it seems
complicated, and maybe there's some other way.

darin (slow to review)

it looks like you are using PostTask to get over to the io thread properly. ...

it looks like you are using PostTask to get over to the io thread properly.
note: g_browser_process->io_thread() will assert if called on a thread other
than the main UI thread of the browser. i'm not sure if that will cause any
problems for some of the consumers of dns_global.

I'm pretty wary of the relocation of the intialization code. I think it will
give away a lot of startup performance. See comments below.
I'm psyched about the overall change (and will LOVE to see it ported).
Can you suggest any compromises or alternatives??
thanks,
Jim
http://codereview.chromium.org/15076/diff/3437/2508
File chrome/browser/renderer_host/resource_dispatcher_host.cc (right):
http://codereview.chromium.org/15076/diff/3437/2508#newcode173
Line 173: chrome_browser_net::DnsPrefetchHostNamesAtStartup(
Moving initialization of the DNS system to this point will significantly slow
startup.
As an experiment, I put a breakpoint on this initialization call, and then
started up my browser.
When the break point was hit, I already had 5 DNS resolutions complete. Worse
yet (at least in my anectdotal experiment) the latency of most of these
resolutions was in excess of 500ms. This means that this change probably delays
the start of the initial resolutions by about a half a second. In cases such as
my anectdotal experiment, such a delay would probably mean that the resolutino
would not be complete before it was needed, and the full latency would be
visible to a user.
For many users, I'd expect this will mean a startup delay (till page load
complete).
Is there any we we can get this kicked off sooner? For example, if the issue is
that we're waiting for some threads to come up, perhaps we can consider
re-ordering thread startups (<sigh> but that seems pretty big <sigh>).
We might end up having a special set of threads tha are only used at startup to
achieve only startup-time hostname pre-resolution. It would be special purpose
code, but the savings in startup time seems quite remarkable (i.e., too much to
ignore).
I've also just finished working in the persisting of the adaptive DNS resolution
(the patch is being code reviewed.. I'll copy you). That info about required
subresources on pages, currently works in concert with the whole DNS system to
accelerate page displays. The good news is that so long as this initialization
precedes any page navigations, it will work. Hence this will probably be a
reasonable place to land those initializations (am I correct, that this point
will be reached before any navigations begin?)

Paweł Hajdan Jr.

On 2009/02/16 20:36:43, jar wrote: > I'm pretty wary of the relocation of the intialization ...

On 2009/02/16 20:36:43, jar wrote:
> I'm pretty wary of the relocation of the intialization code. I think it will
> give away a lot of startup performance. See comments below.
>
> Can you suggest any compromises or alternatives??
I see no problem starting it in some other place. The only requirement is
existing IO message loop I think. So, maybe BrowserProcessImpl?
Darin, can you comment about the best (and early) place for dns prefetching
startup? Or maybe it can stay in browser_main.cc? I'm a bit confused with this
startup sequence...
And now, if we move startup away from ResourceDispatcherHost (compared to last
changeset), maybe moving dns prefetching shutdown to other place would be good
too.

Paweł Hajdan Jr.

Darin, can you look again? I think the issues we discussed should be fixed now. ...

On 2009/02/18 19:28:27, darin wrote:
> // Shutdown DNS prefetching now to ensure that network
> // stack objects living on the IO thread get destroyed
> // before the IO thread goes away.
Thanks, fixed.
> http://codereview.chromium.org/15076/diff/3450/2523
> File chrome/browser/net/dns_global.h (left):
>
> http://codereview.chromium.org/15076/diff/3450/2523#oldcode51
> Line 51: class DnsPrefetcherInit {
> hmm... maybe we could keep this class, but change the dtor to call
> FreeDnsPrefetchResources instead? would that allow you to avoid having to
> encode a call to FreeDnsPrefetchResources manually?
Done.
> we could perhaps tweak FreeDnsPrefetchResources to assert that
> ShutdownDnsPrefetch was called.
We already assert in DnsMaster's dtor.
> http://codereview.chromium.org/15076/diff/3450/2529
> File chrome/browser/net/dns_master_unittest.cc (right):
>
> http://codereview.chromium.org/15076/diff/3450/2529#newcode70
> Line 70: net::EnsureWinsockInit();
> btw, i wonder if this is still needed.
>
> HostResolver's ctor also calls this function. i think we only had the
explicit
> call to EnsureWinsockInit because we were using getaddrinfo directly. now, we
> might be able to remove it from here and even browser_main.cc.
I would do this in the follow-up to avoid the risk of further delaying this
patch.
Please also look at my comment about Lock in dns_master.h. Do you have
suggestion how to write it better? I think that explicitly listing all protected
variables would make the comment long (and it might go out of sync with future
updates).