If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Enjoy an ad free experience by logging in. Not a member yet? Register.

Distinct Count of values, but need a wildcard

SELECT name,COUNT(*) as count FROM tablename GROUP BY name ORDER BY count DESC;

It will give me the count of each occurrence of a distinct value of a field. However, I am wondering if there would be a way to modify this to have a wildcard.

The background is, I have a field in my table that contains URL's. There are plenty that are in the same domain...ie www.codingforums.com, however, they all contain different points of reference after the .com. More specifically, my question is I want to search for number of occurrences for the domains. They all should begin either http:// or https:// but not all would have the www. and such.

Not sure it is possible but figured I would ask since Google searching didn't really give me anything useful to go off of.

SELECT name,COUNT(*) as count FROM tablename GROUP BY name ORDER BY count DESC;

It will give me the count of each occurrence of a distinct valueof a field.

*** NOT TRUE! ***

It will count *ALL* records *PER NAME*.

If you wanted a count of DISTINCT VALUES then you would need to use:

Code:

SELECT name, COUNT( DISTINCT fieldname ) AS count
FROM tablename GROUP BY name ORDER BY count DESC

And if you wanted a count of NON-NULL values of a given field you would do

Code:

SELECT name, COUNT( fieldname ) AS count
FROM tablename GROUP BY name ORDER BY count DESC

But using COUNT(*) simply counts *ALL* records. Period.

***********

Having cleared that up...

The background is, I have a field in my table that contains URL's. There are plenty that are in the same domain...ie www.codingforums.com, however, they all contain different points of reference after the .com. More specifically, my question is I want to search for number of occurrences for the domains. They all should begin either http:// or https:// but not all would have the www. and such.

So, more simply stated, you want to group by domain name alone. Yes?

There's no simple way build into MySQL. It's true that MySQL has regular expressions, which would seem the ideal way, but unfortunately they are limited to use with the RLIKE operator.

So you could easily search for all occurrences of codingforums.com for example, but FINDING all the domains is much trickier.

A lot of it will depend on how "regular" the contents of that field are. Does the field *ONLY* contain URLs? Or does it contain URLs embedded within text? e.g.,

Then I think we can do this. But what about a url such as http://download.microsoft.com? That is, a subdomain?

Are you worried about them? I think it would be tough to create a simple expression to look for those, as well.

Ignoring subdomains, this SEEMS to work:

Code:

SELECT
SUBSTR(
REPLACE(REPLACE(REPLACE(CONCAT(url,'/'),'http://',''),'https://',''),'www.',''),
1,
LOCATE('/',REPLACE(REPLACE(REPLACE(CONCAT(url,'/'),'http://',''),'https://',''),'www.',''))-1
) AS domain,
COUNT(*) AS howmany
FROM urls
GROUP BY domain
ORDER BY howmany DESC

Users who have thanked Old Pedant for this post:

First, we concat a '/' to the end of the URL, just in case there is none there already.

Then we replace 'http://' and 'https://' and 'www.' with nothing.

That gets us down to something like codingforums.com/gobbledygook/

So then we use LOCATE to find the first '/' in that (which MIGHT be the one we added at the first step).

And then we use SUBSTR to get all the characters up to but not including that first '/'.

Make sense?

It does make sense, however I double checked my table and it does contain subdomains as well. Perhaps if it were possible to also ignore the first part of the subdomain as we did with the www., then as long as I could display the entire field at the end, I think I would get good results. I don't see any duplicate domains that have different sudomains....it seems if I have download.microsoft.com, that is all I have, I don't have ie.microsoft.com as far as I can tell.