The Moz Blog

25 Killer Combos for Google's Site: Operator

There’s an app for everything – the problem is that we’re so busy chasing the newest shiny toy that we rarely stop to learn to use simple tools well. As a technical SEO, one of the tools I seem to never stop finding new uses for is the site: operator. I recently devoted a few slides to it in my BlueGlassX presentation, but I realized that those 5 minutes were just a tiny slice of all of the uses I’ve found over the years.

People often complain that site:, by itself, is inaccurate (I’ll talk about that more at the end of the post), but the magic is in the combination of site: with other query operators. So, I’ve come up with two dozen killer combos that can help you dive deep into any site.

1. site:example.com

Ok, this one’s not really a combination, but let’s start with the basics. Paired with a root domain or sub-domain, the [site:] operator returns an estimated count of the number of indexed pages for that domain. The “estimated” part is important, but we’ll get to that later. For a big picture, I generally stick to the root domain (leave out the “www”, etc.).

Each combo in this post will have a clickable example (see below). I'm picking on Amazon.com in my examples, because they're big enough for all of these combos to come into play:

You’ll end up with two bits of information: (1) the actual list of pages in the index, and (2) the count of those pages (circled in purple below):

I think we can all agree that 273,000,000 results is a whole lot more than most of us would want to sort through. Even if we wanted to do that much clicking, Google would stop us after 100 pages. So, how can we get more sophisticated and drill down into the Google index?

2. site:example.com/folder

The simplest way to dive deeper into this mess is to provide a sub-folder (like “/blog”) – just append it to the end of the root domain. Don’t let the simplicity of this combo fool you – if you know a site’s basic architecture, you can use it to drill down into the index quickly and spot crawl problems.

You can also drill down into specific sub-domains. Just use the full sub-domain in the query. I generally start with #1 to sweep up all sub-domains, but #3 can be very useful for situations like tracking down a development or staging sub-domain that may have been accidentally crawled.

4. site:example.com inurl:www

The "inurl:" operator searches for specific text in the indexed URLs. You can pair “site:” with “inurl:” to find the sub-domain in the full URL. Why would you use this instead of #3? On the one hand, "inurl:" will look for the text anywhere in the URL, including the folder and page/file names. For tracking sub-domains this may not be desirable. However, "inurl:" is much more flexible than putting the sub-domain directly into the main query. You'll see why in examples #5 and #6.

5. site:example.com -inurl:www

Adding [-] to most operators tells Google to search for anything but that particular text. In this case, by separating out "inurl:www", you can change it to "-inurl:www" and find any indexed URLs that are not on the "www" sub-domain. If "www" is your canonical sub-domain, this can be very useful for finding non-canonical URLs that Google may have crawled.

6. site:example.com -inurl:www -inurl:dev -inurl:shop

I'm not going to list every possible combination of Google operators, but keep in mind that you can chain most operators. Let's say you suspect there are some stray sub-domains, but you aren't sure what they are. You are, however, aware of "www.", "dev." and "shop.". You can chain multiple "-inurl:" operators to remove all of these known sub-domains from the query, leaving you with a list of any stragglers.

7. site:example.com inurl:https

You can't put a protocol directly into "site:" (e.g. "https:", "ftp:", etc.). Fortunately, you can put "https" into an "inurl:" operator, allowing you to see any secure pages that Google has indexed. As with all "inurl:" queries, this will find "https" anywhere in the URL, but it's relatively rare to see it somewhere other than the protocol.

8. site:example.com inurl:param

URL parameters can be a Panda's dream. If you're worried about something like search sorts, filters, or pagination, and your site uses URL parameters to create those pages, then you can use "inurl:" plus the parameter name to track them down. Again, keep in mind that Google will look for that name anywhere in the URL, which can occasionally cause headaches.

Pro Tip: Try out the example above, and you'll notice that "inurl:ref" returns any URL with "ref" in it, not just traditional URL parameters. Be careful when searching for a parameter that is also a common word.

9. site:example.com -inurl:param

Maybe you want to know how many search pages are being indexed without sorts or how many product pages Google is tracking with no size or color selection – just add [-] to your "inurl:" statement to exclude that parameter. Keep in mind that you can combine "inurl:" with "-inurl:", specifically including some parameters and excluding others. For complex, e-commerce sites, these two combos alone can have dozens of uses.

10. site:example.com text goes here

Of course, you can alway combine the "site:" operator with a plain-old, text query. This will search the contents of the entire page within the given site. Like standard queries, this is essentially a logical [AND], but it's a bit of a loose [AND] – Google will try to match all terms, but those terms may be separated on the page or you may get back results that only include some of the terms. You'll see that the example below matches the phrase "free Kindle books" but also phrases like "free books on Kindle".

11. site:example.com “text goes here”

If you want to search for an exact-match phrase, put it in quotes. This simple combination can be extremely useful for tracking down duplicate and near-duplicate copy on your site. If you're worried about one of your product descriptions being repeated across dozens of pages, for example, pull out a few unique terms and put them in quotes.

12. site:example.com/folder “text goes here”

This is just a reminder that you can combine text (with or without quotes) with almost any of the combinations previously discussed. Narrow your query to just your blog or your store pages, for example, to really target your search for duplicates.

13. site:example.com this OR that

If you specifically want a logical [OR], Google does support use of "or" in queries. In this case, you'd get back any pages indexed on the domain that contained either "this" or "that" (or both, as with any logical [OR]). This can be very useful if you've forgotten exactly which term you used or are searching for a family of keywords.

Edit: Hat Tip to TracyMu in the comments - this is one case where capitalization matters. Either use "OR" in all-caps or the pipe "|" symbol. If you use lower-case "or", Google could interpret it as part of a phrase.

14. site:example.com “top * ways”

The asterisk [*] can be used as a wildcard in Google queries to replace unknown text. Let's say you want to find all of the "Top X" posts on your blog. You could use "site:" to target your blog folder and then "Top *" to query only those posts.

15. site:example.com “top 7..10 ways”

If you have a specific range of numbers in mind, you can use "X..Y" to return anything in the range from X to Y. While the example above is probably a bit silly, you can use ranges across any kind of on-page data, from product IDs to prices.

16. site:example.com ~word

The tilde [~] operator tells Google to find words related to the word in question. Let's say you wanted to find all of the posts on your blog related to the concept of consulting – just add "~consulting" to the query, and you'll get the wider set of terms that Google thinks are relevant.

17. site:example.com ~word -word

By using [-] to exclude the specific word, you can tell Google to find any pages related to the concept that don't specifically target that term. This can be useful when you're trying to assess your keyword targeting or create new content based on keyword research.

18. site:example.com intitle:”text goes here”

The "intitle:" operator only matches text that appears in the <TITLE></TITLE> tag. One of the first spot-checks I do on any technical SEO audit is to use this tactic with the home-page title (or a unique phrase from it). It can be incredibly useful for quickly finding major duplicate content problems.

19. site:example.com intitle:”text * here”

You can use almost any of the variations mentioned in (12)-(17) with "intitle:" – I won't list them all, but don't be afraid to get creative. Here's an example that uses the wildcard search in #14, but targets it specifically to page titles.

Pro Tip: Remember to use quotes around the phrase after "intitle:", or Google will view the query as a one-word title search plus straight text. For example, "intitle:text goes here" will look for "text" in the title plus "goes" and "here" anywhere on the page.

20. intitle:”text goes here”

This one's not really a "site:" combo, but it's so useful that I had to include it. Are you suspicious that other sites may be copying your content? Just put any unique phrase in quotes after "intitle:" and you can find copies across the entire web. This is the fastest and cheapest way I've found to find people who have stolen your content. It's also a good way to make sure your article titles are unique.

21. “text goes here” -site:example.com

If you want to get a bit more sophisticated, you can use "-site:" and exclude mentions of copy on any domain (including your own). This can be used with straight text or with "intitle:" (like in #20). Including your own site can be useful, just to get a sense of where your ranking ability stacks up, but subtracting out your site allows you to see only the copies.

22. site:example.com intext:”text goes here”

The "intext:" operator looks for keywords in the body of the document, but doesn't search the <TITLE> tag. The text could appear in the title, but Google won't look for it there. Oddly, "intext:" will match keywords in the URL (seems like a glitch to me, but I don't make the rules).

23. site:example.com ”text goes here” -intitle:"text goes here"

You might think that #22 and #23 are the same, but there's a subtle difference. If you use "intext:", Google will ignore the <TITLE> tag, but it won't specifically remove anything with "text goes here" in the title. If you specfically want to remove any title mentions in your results, then use "-intitle:".

24. site:example.com filetype:pdf

One of the drawbacks of "inurl:" is that it will match any string in the URL. So, for example, searching on "inurl:pdf", could return a page called "/guide-to-creating-a-great-pdf". By using "filetype:", you can specify that Google only search on the file extension. Google can detect some filetypes (like PDFs) even without a ".pdf" extension, but others (like "html") seem to require a file extension in the indexed document.

25. site:.edu “text goes here”

Finally, you can target just the Top-Level Domain (TLD), by leaving out the root domain. This is more useful for link-building and competitive research than on-page SEO, but it's definitely worth mentioning. One of our community members, Himanshu, has an excellent post on his own blog about using advanced query operators for link-building.

Why No Allintitle: & Allinurl:?

Experienced SEOs may be wondering why I left out the operators "allintitle:" and "allinurl:" – the short answer is that I've found them increasingly unreliable over the past couple of years. Using "intitle:" or "inurl:" with your keywords in quotes is generally more predictable and just as effective, in my opinion.

Putting It All to Work

I want to give you a quick case study to show that these combos aren't just parlor tricks. I once worked with a fairly large site that we thought was hit by Panda. It was an e-commerce site that allowed members to spin off their own stores (think Etsy, but in a much different industry). I discovered something very interesting just by using "site:" combos (all URLs are fictional, to protect the client):

(1) site:example.com = 11M

First, I found that the site had a very large number (11 million) of indexed pages, especially relative to its overall authority. So, I quickly looked at the site architecture and found a number of sub-folders. One of them was the "/stores" sub-folder, which contained all of the member-created stores:

(2) site:example.com/stores = 8.4M

Over 8 million pages in Google's index were coming just from those customer stores, many of which were empty. I was clearly on the right track. Finally, simply by browsing a few of those stores, I noticed that every member-created store had its own internal search filters, all of which used the "?filter" parameter in the URL. So, I narrowed it down a bit more:

(3) site:example.com/stores inurl:filter = 6.7M

Over 60% of the indexed pages for this site were coming from search filters on user-generated content. Obviously, this was just the beginning of my work, but I found a critical issue on a very large site in less than 30 minutes, just by using a few simple query operator combos. It didn't take an 8-hour desktop crawl or millions of rows of Excel data – I just had to use some logic and ask the right questions.

How Accurate Is Site:?

Historically, some SEOs have complained that the numbers you get from "site:" can vary wildly across time and data centers. Let's cut to the chase: they're absolutely right. You shouldn't take any single number you get back as absolute truth. I ran an experiment recently to put this to the test. Every 10 minutes for 24 hours, I automatically queried the following:

site:seomoz.org

site:seomoz.org/blog

site:seomoz.org/blog intitle:spam

Even using a fixed IP address (single data center, presumably), the results varied quite a bit, especially for the broad queries. The range for each of the "site:" combos across 24 hours (144 measurements) was as follows:

67,700 – 114,000

8,590 – 8620

40 – 40

Across two sets of IPs (unique C-blocks), the range was even larger (see the "/blog" data):

67,700 – 114,000

4,580 – 8620

40 – 40

Does that mean that "site:" is useless? No, not at all. You just have to be careful. Sometimes, you don't even need the exact count – you're just interested in finding examples of URLs that match the pattern in question. Even if you need a count, the key is to drill down. The narrowest range in the experiment was completely consistent across 24 hours and both data centers. The more you drill down, the better off you are.

You can also use relative numbers. In my example above, it didn't really matter if the 11M total indexed page count was accurate. What mattered was that I was able to isolate a large section of the index based on one common piece of site architecture. Assumedly, the margin of error for each of those measurements was similar – I was only interested in the relative percentages at each step. When in doubt, take more than one measurement.

Keep in mind that this problem isn't unique to the "site:" operator – all search result counts on Google are estimates, especially the larger numbers. Matt Cutts discussed this in a recent video, along with how you can use the page 2 count to sometimes reduce the margin of error:

The True Test of An SEO

If you run enough "site:" combos often enough, even by hand, you may eventually be greeted with this:

If you managed to trigger a CAPTCHA without using automation, then congratulations, my friend! You're a real SEO now. Enjoy your new tools, and try not to hurt anyone.

116 Comments

Another cool thing about the site command that I just recently discovered: It works with Image Search and Video Search as well!So to find all the images on a specific website, just use "site:domain.com" and then click over to the images tab, and it'll display all the images hosted on that domain (that have been indexed by Google).The same thing works with videos as well, except it will show not only videos hosted on the domain, but also videos that are embeded in the site from sources from sites like YouTube as well. And of course you can narrow down your image/video search query using any of the modifiers already mentioned in this post.

Certainly the site: operator can be a great time saver in quickly identifying duplicate content or indexation issues. On your last point about the numbers varying significantly between datacenters what I've found is that they can vary even more when checking on different local engines.

For instance, running site:johnlewis.com on Google.co.uk returns 890K results.Running the same query on Google.com, returns 408K results and doing the same on Google.com.au returns 1,750K results. In this case the returned results do not make much sense as this is a UK site which appears to have twice as many pages indexed in Google.com.au compared to Google.co.uk .

Yeah, it can pretty weird across ccTLDs. I try to stick to the most relevant one and track numbers over time if I really want some confidence in the absolute number. Day-to-day fluctuations for just site:example.com can be wild.

Sir I have noticed some sudden drops when checking site:bizandlegis.com in google.com and in google.co.in Sometimes it shows more than 3,800 but some times the visibility downs to below 2000s.( especially on Thursday morning) I think Google algorithm reduces our visibility when they apply strict values. is it right?

Hey Modesto did you search using a VPN to deceive Google that you are searching from different countries. I am asking this because I went to google.co.in, google.com, google.co.uk and google.com.au without using any VPN from India and got the same result on all the cc level Google domains for a "site:" search that I performed.

Another really simple one which I don't see people using too often is:site:domain.com -site:www.domain.comA great way to see if you have, for example, content being indexed from more than just the "www" sub-domain.

Nailed it Doc, pretty much every :site operator one would need.Another :site operator I find myself using quite often when I'm searching for a country-specific website is :site.(TLD). When you're looking for a Dutch website for instance, the operator would look like this: site:.nl "query"

The funny thing is when google captcha gets triggered it affects the entire IP range at the company I work. First time that happened everybody freaked out "Google is broken" :)) Now that they know who to blame every once in a while I'm the most hated guy in the office :))

Operators are really helpful in pacifying and refining the search. And not only for the users, for the webmaster community also, they have become a master tool, so as to optimize their tags relevant to these operators.

The CAPTCHA comment gave me a great laugh; I'm excellent at getting hit with those. I learned that Google contacts our IT team to specifically point out that (apparently) I could be a spammer. Month after month, I raise their flag. Not only do I now feel official, but given all the expanded site operator offerings, you may help get me blacklisted altogether!

I'm curious if you could solve the problem by using multiple IP addresses, when doing these searches (To avoid the captcha problem). But will this swing the results exceptionally like the ccTLD's. I guess the point of the excercise is to spot stuff..as Dr Pete mentioned

Have you ever noticed that the number the site: operator returns often varies it's answer just by clicking through the paginated SERPs? Usually I click to the last page to find the number I consider "most accurate". I wonder how that varies over time and IPs...

With regards to searching for parameters with inurl operator and this comment:

"Try out the example above, and you'll notice that "inurl:ref"
returns any URL with "ref" in it, not just traditional URL parameters.
Be careful when searching for a parameter that is also a common word."

If you add = sign at the end of parameter name, the margin of error (picking up wrong URLs) will be greatly reduced.

I've honestly found that, in most cases, Google just ignores the "=" - I still get URLs like: "/Pocket-Ref-Edition-Thomas-Glover/sim/1885071620/2". I still add the "=" when I'm looking at URL parameters, but I'm not completely convinced it makes a difference.

Hey Dr Pete, I realise this post has been published more than 1 year ago and comments are probably not read anymore but I was wondering how you would use Google Spreadsheet to run a bulk check of Google result count using Google's Site: Operator ? For example, Id like to know how many results return site:domain.com "cats" and do it for all the animal I can thinks of. Thanks for the article anymore, still very helpful for 2014.

These are good, but you should try the query "site:* anywebsite.com" to see backlinks to your website or any competitor's. Google isn't showing as much these days, but you can still get some good info.

I quite often (a few times per week) check visitors and site information for my website (software-voorraadbeheer.nl) . The tool I use pulls info from Google using search operators like site: or info: (to sum our amount of indexed pages and more). And now since a little while I get Capthca request all the time. A way to prevent this and is this because of the tool? It is quite annoying. Can`t Google whitelist me or something? I search like 30 times a day on Google and it is taking some time. Strange thing is also that this is not always, just some days. Let me know!

Tools that are run on your computer that do automated queries will often trigger a captcha from Google, as they're not big fans of any query that isn't generated by a human. You'll also see captcha often at SEO conferences, when you have several hundred marketers on the same IP doing advanced searches.

They're not going to whitelist anyone for violating the terms of service (using an automated tool).

thanks for the valueable infor, title tags are really backbone of SEO, also you can do title search with a website by using sitename:xyz.com intitle:keyword query as explained on http://techleaks.us/seo-search-query-operators-for-google-you-must-know/ ......Do u know how to use time and weather operators ?..... also i want to know about using the + operator in google search .... if u an guide me ?

You nailed it Doc Pete! Another good post from you. I really didn't realize that there were so many options. I will definitely bookmark this and share this to my friends. I am so sure that they will learn a lot form this post of yours.

About "site:" search operator... this morning i'm seeing weird behaviour on Google.Whatever domain you put after "site:" Google will show just 3 pages and the well known message:In order to show you the most relevant results, we have omitted some entries very similar to the 30 already displayed.If you like, you can repeat the search with the omitted results included.I mean "site:wikipedia.org", "site:seomoz.org" am I missing something? I guess so...

Hi Keri, I`m just wondering now I read this. I had this once searching on some keyword and Google limited my search results to like 60pages. Althought it mentioned '45.000' results. Can I search further then the 600st result?

PS. you probably thinking i`m spamming this site because this is like my third comment in 15 minutes, but this post got my interest :)

Ok, I agree it is not usual to search further then the 60th page, but I just cam accross this. If it is an estimation, Google is not good at it. Giving 600 results and estimating 45000 is quite a difference ;)

This was such a good article (and a needed one) that I actually signed up here to post and say thanks... I also have a question I don't think was answered above....

If I wanted to use the 'allinurl: / inurl:' operator and at the same time make sure that I didnt get any folders or even subdomains - Would anyone know how to do this?

Allow me to provide an example:

If iIwas looking for the word 'dog' within URLS - then the kind of results I WOULD want, would be urls like "www. dog122.com", "www.dogscats.com" & or "www.dogsaregreat.com"...... but the results I WOULD NOT want would be URLS like "www.website.com/dogs", "www.website.com/dogs or "dogs.website.com".... would anyone know how to set this up? Any help appreciated... I think this would be a neat trick:)

All I want is home/index pages though. in fact i dont even want index.html in results I just want domain only results... Are there any operators that can help me do that? I cant get top level domains results by using "-inurl" because the variables would be endless... If possible, I'd even like to avoid sub domains as an added option.

You can use site:/-site: to isolate the root domain, but there's no way to tell Google you want to include/exclude a word in just the sub-domain but not at the folder/page level, that I'm aware of. Quite often, what you do end up doing is chaining together a string of negative operators until you've at least got something pinned down to a manageable size. It's not pretty, but it'll often help you sort out the big issues.

Actually, try this - it'll need some tweaking, and I can't test it fully without knowing the real situation, but you could use inurl, -site, and wild-cards. For example:

inurl:dogs -site:*dogs.*

That should kick out anything with "dogs" as a sub-domain or at the end of the root domain. Wild-cards are a bit unpredictable with Google, so you'll need to try it out for your specific case and experiment. You could use multiple -site operators with different wild-cards, too (that may work better than cramming multiple wild-cards into one string).

This query "inurl:dogs -site:*dogs*" would appear to give me exactly the same results as "inurl:dogs"

Ive found that in theory a lot of things would be possible when combining different operators.. .However, more often than not, 2 or more operators will not work in conjunction with each other.

I'm assuming that google has stopped us from finding every root domain according to a keyword/s... And that it could have to do with Intellectual property... or more to the point google not wanting to encourage more DMCA notices than they already get.

Im afraid this idea will have to go the same way as "achieving more than 100 results in a CSE" - Both of which btw I'd pay good money for (were it possible and legal of course).

Yeah, sorry about that - I realized an hour or so later that it was just coincidence. The first 3 pages or so of the SERP weren't showing any domains with "dog", but that was just dumb luck. The operator [site:*dog*.*] doesn't return results. Oddly, [site:*dog.*] does work, as well as [-site:*dog.*], but it only includes (or excludes) root domains that end in "dog". There's no "insite:" or "indomain:" operator, and wild-cards are always a bit odd with Google.

There's an explanation here that seems to be accurate, given further testing:

Google's result counts are estimates, and the estimates on page 1 tend to have a bit more error than if you drill down (something they've officially admitted). In some cases the result count on page 2 can look a lot different. If you trill down and see 1,060 on both pages, though, then the error is probably very low and this number may be fairly reliable.

Thanks for taking the time to compile a great post, the site operators and advanced search operators are aspects of SEO that are often overlooked as there are a growing number of tools that will get the results for you with out you needing to know how they gather the data.Thanks again for the timely remindersSean

I had used this command to see backlinks site:http://www.example.com -site:http://www.example.comBut this command is not working ... is any thing i am missing ... or any one have good idea for checking backlinks through search operator in google

Great compilation, Pete. As a student I was introduced to using filetype (pdf, excel). Today I find Google Operators useful (in a non-SEO sense) to figure out what our competing online stores are doing on the merchandising front. It's sometimes as efficient as running an SQL query on their database.

i was aware about some of them but some combo's are totally new. Infact these combo's are very much effective. As being an SEO Analyst, these combo's plays very important role for me. I am extremely happy and thankful to you.

Great article as always... but for rest of our SeoMoz lovers I would like to recommend to enroll with Google advance powers searching course to learn most of search features like these one listed in this article.Visit: http://www.powersearchingwithgoogle.com/course/aps﻿

Great tips looks like much better then a tool Dr. Pete, I think it must be helpful for SEO experts like me :) any was I just curious about back links, still not sure to get actual back links for website even if I don't have the webmaster access.

I like to use a bookmarklet for this to go straight from current url i.e. example.com/blog to make the query with one click. Then I add more to the query if required, say to find all indexed blog posts in a category - site:example.com/blog "Filed in categoryname"

Very useful variations so thanks for sharing. I think I've been guilty of not using the site: operators enough, so will have to use some of these for doing audits. It's quite easy to rely on the shiny tools/software but it feels a bit more detailed to grab specifically what you are looking for.

great post Pete. I use many of these variations during my audit cycle. Just using the site: operator with the -www can reveal sub-domains the client failed to mention. I also find playing with the site: method invaluable to get a quick look at how many pages are split across various sub-domains.