Thank you very much Dave for your response.
You are right. Only the text/html content is mapped to URI /roller-ui/rendering/page
and caught by PageServlet and invoked JPA named query for weblog. All the resource files
are mapped to URI '/roller-ui/rendering/resources'. Roller is very complicated, indeed.
Now I would like to ask one more question. Now we know, for each query to a weblog page,
there going to be one named JPA query, or a database select query. What if some one launch
an attack on weblog pages on a Roller site? While registration page and login page can be
protected with captcha, weblog pages have to withstand whatever it is. Now the bottleneck
of Roller will be the database server. Roller should be easily scaled up the by different
means such as clustering.
What do you think should we do to protect the Roller against an attack described above?
Do you think it should be better if we use cache for last-modified?
Thank you very much.
David
--- On Tue, 5/25/10, Dave <snoopdave@gmail.com> wrote:
From: Dave <snoopdave@gmail.com>
Subject: Re: Roller's implementation on conditional Get
To: user@roller.apache.org, david.ming.xia@ibol.biz
Date: Tuesday, May 25, 2010, 8:47 AM
On Fri, May 21, 2010 at 12:09 PM, (David) Ming Xia
<david.ming.xia@ibol.biz> wrote:
> This is about the implementation of conditional Get in Roller 4.0.1.
> As far as I see, Roller 4.0.1 supports conditional Get. Upon request, Roller checks
the ‘If-Modified-Since’ field in the http header, and compares it with ‘Last-Modified’
attribute on server side. And then either responds with a fresh page with status code 200,
or responds with a status code 304.
That is true for blog pages and feeds only.
> What I feel concerned is the part retrieving ‘Last-Modified’. It is implemented
in org.apache.roller.weblogger.ui.rendering.servlets.PageServlet. Attached you can see the
sequence diagram, which depicts the related class.
I don't see any sequence diagram. This mailing list does not accept
attachments. Perhaps you could post the picture somewhere and send a
URL?
Every time a weblog entry is added or changed, the ‘last-modified’
field of corresponding website table will be updated. For any http
request, PageServlet has to go through a JPA named query to get the
‘last-modified’ value. That value is not cached in memory, and it is
not kind of way that the entities float across context (any how...).
So as far as I can see, it is hard query.
> But for one page query, there are usually at least ten http query, including query
for text/html file, css file, js file, images, and so on.
Right, but CSS files and JS files that are file systems resources
(theme files, etc.) are served directly by the Servlet Engine, which
has its own conditional GET implementation, and NOT through the Roller
PageServlet.
> So for 10000 simultaneous page requests, there will be at least 100000 simultaneous database
queries. Furthermore, for any serious production environment, database and application
server are on different tiers and the connection is encrypted with SSL. So the picture to
me it that, for limited concurrent users it is fine, but when request volume goes up, the
server may suddenly chocked up.
When something in a weblog changes, we invalidate the weblog's cache
and this works well because lot more reads than writes. There might be
a couple of bloggers and thousands of readers and subscribers. So, the
cache is rarely invalidated.
And like I said, the page servlet caches only pages so what you said
about 100,000 database queries is not true unless you are storing CSS,
JS and other static resources as Roller page templates -- which you
should not be doing.
- Dave