the count of distinct ids for each bookmark should always be 1... i just threw that in there in case you might have duplicates (which you can avoid by defining a UNIQUE index on the uri, and then you won't need this count)

the count of distinct ids for each bookmark should always be 1... i just threw that in there in case you might have duplicates (which you can avoid by defining a UNIQUE index on the uri, and then you won't need this count)

Hi and thanks for the reply.

Yes, there are duplicate URIs; that's an integral function of the application.

the count of distinct ids for each bookmark should always be 1... i just threw that in there in case you might have duplicates (which you can avoid by defining a UNIQUE index on the uri, and then you won't need this count)

I got an error relating to a non unique alias for the links table, which I removed from the FROM. But then I got another error:

Needless to say, there most certainly is a column called 'id' for the bookmarks table.

You'll perhaps notice that there's a duplicate (links.bookmark_id = bookmarks.id) in the statement; that's a hold over from the previous statement. Even if I remove it, I still get the error.

r937
—
2011-02-23T16:07:08Z —
#7

okay, we'll have to deconstruct your latest query to focus on what you're really after

what i gave you in post #3 was based on your original requirements, but your latest query adds all kinds of extra stuff, much of which is quite incompatible with the original GROUP BY clause

let me give you an analogy to illustrate the conceptual problem you're having

suppose you have a school consisting of multiple classrooms, where each classroom has multiple students

now i will ask you to write a query which returns the number of students in each classroom, along with the student's last name

your first and immediate response should be "which student?"

it is the same issue -- when you have a GROUP BY clause, every column in the SELECT clause must either be mentioned in the GROUP BY clause or be the argument of an aggregate function, like COUNT() or MAX()

do you see the issue?

Forbes
—
2011-02-23T16:13:56Z —
#8

r937 said:

okay, we'll have to deconstruct your latest query to focus on what you're really after...

yeah, it is down to mixing the (deprecated) comma-list join syntax together with explicit JOIN syntax

but we will have to tear all that apart anyway...

Forbes
—
2011-02-23T16:31:41Z —
#10

r937 said:

Do you see the issue?

Yes, sort of.

Why would I be asking you which student? The whole purpose of this query is to just return the names of those in the same classrooms, not to return a specific student by name.

That aside, what do you want me to do? I' assuming the statement will need carving into two, or something like that.

r937
—
2011-02-23T16:53:36Z —
#11

Forbes said:

Why would I be asking you which student?

because a GROUP BY on the classroom returns a single aggregate row for the classroom, and since there are multiple ~different~ student names in a classroom, asking for one of them, without stating which one, is nonsensical

similarly, you have both

, GROUP_CONCAT(links.user_id) AS users
, links.user_id

in the SELECT clause -- the first one is okay, because it aggregates all the users into a single string, but the second one isn't, because it's not an aggregate function

you mentioned that there could be duplicate urls in this table -- may i ask why?[/quote]Sorry about the late reply — life!

I'm working on a bookmarking service, so it's a function of that service to allow people to add the same bookmark.

Over time, the number of times a bookmark has been added will build towards a ranking algorithm.

r937
—
2011-02-24T11:29:13Z —
#13

okay, so given that a particular url can be in the table more than once, will ~all~ rows for the same url have identical titles and snippets and datetimes?

presumably the answer is no

so let me ask you then, if you were to aggregate/collapse all the rows for a particular url into one result row, which title or snippet or datetime would you like to see? keep in mind the student name analogy for the classroom count

Forbes
—
2011-02-24T11:45:12Z —
#14

r937 said:

okay, so given that a particular url can be in the table more than once, will ~all~ rows for the same url have identical titles and snippets and datetimes?

So far, Under Cloud can handle duplicate bookmarks on an individual basis, like the aforementioned.

r937 said:

so let me ask you then, if you were to aggregate/collapse all the rows for a particular url into one result row, which title or snippet or datetime would you like to see? keep in mind the student name analogy for the classroom count

As with the example I provided, the first instance would be the proper bookmark to use, as all others are more recent.

So in the index view, when a duplication is discovered, the very first bookmark would be used, along with the date and time by which it was added.

I suppose the next logical question to you would be, are the duplicates detected within the rows retrieved within the remit of the LIMIT clause, or across the whole of the table?

r937
—
2011-02-24T11:58:29Z —
#15

Forbes said:

So in the index view, when a duplication is discovered, the very first bookmark would be used, along with the date and time by which it was added.

not sure what you mean by index view

here's the query which counts the bookmarks, augmented with the "earliest" data...

1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'ON (bookmarks.url = source.url) AND (bookmarks.datetime = source.earliest) AND (' at line 1

From what I can grasp, MySQL doesn't know to handle the various references to columns belonging to the links and tags tables, and I don't know where to put the table names.