During the crawling your site is hit and cache is generated if you have some implemented, for example phpFastCache.
I just only crawl via IPv4 without HTTP keep alive (only for better performance). Setting some unique user agent is good for parsing in access_log, too.
-X stands for excluding, also handy for improving performance on large sites.
-o outputs the wget status report where you can search for HTTP status codes as 404, 403, 500, etc.
Remember the excluded path, you might run another wget in parallel if you need. Unfortunately wget can’t run parallel threads as of writing.