BP bases site data/activity entries on WP search engine instructions

Description

If the WP dashboard option under settings > reading is set to checked to discourage search engines from indexing the site or page via robots.txt / meta tags Buddypress uses the option value to determine whether it shows or records activity stream data.

The contention is this behaviour is wrong!

If I have a site, essentially a private community; it has sub blogs I have checked the box as I do not want Google indexing however I do want those sites recording their activity. Presently this can't happen unless I uncheck the indexing option and let search engines index.

BP appears to take the WP option 'blog_public' to be literal even though as far as I can see in /wp-includes/functions.php L:1077 & /wp-includes/general-template.php L:1709 WP uses the option value to set robots.txt & meta tags respectively.

In bp-blogs-functions.php l:141 we check for value of $_POST['blog_public] to determine whether blog activity is recorded to activity stream.

My argument is that using this option or post value to decide what to do is wrong as it can't be determined that the user has set the index option for the right or wrong reasons and regardless the option is simply there to set an instruction to search engines that has no formal power, it doesn't have to be honoured. BP is thus saying as this option has a value we are going to assume that the intention is the user doesn't want their sub site activity recorded across the primary BP site, but that feels a bit of a assumption.

I can understand the argument that indeed users may want to set their blog as 'Private' and that this is the only means possible that BP has to look for some clue as to the users desire but it's not a good one given this 'blog_public' option really has little to do with privacy in the strict sense.

I also realise there is likely little that can be done to effect a change as an option would need to be provided from individual blogs and also likely would really need to be a new option altogether.

However I first ran across this issue reported by a user in respect of iirc forum (bbp) posts not displaying in activity stream and on a non MS install.

Oldest firstNewest firstThreaded

Comments only

Change History (9)

The reason we use this is it's the only available WordPress setting that comes close to being what we want. If we're going to introduce the concept of blog-visibility, I'd like to have real access control instead of just a piece of meta-data.

Realised that this was the reason it was used as such, and agreed it would need to be real access control not this piece of meta data which is what worried me, meta data of this form and purpose isn't suitable as it's not defined as privacy, Just wanted to get the 'issue' logged as this is the second time it's come to my attention as mildly problematical under different circumstances - easy to work around though but nonetheless...

Further thoughts:
I'm going to argue that the principle of hiding this activity data based on this meta option ought to be removed. If it's mooted that it is not the best approach why is it being used, why indeed is there the notion of privacy attempted when it's based on a setting that I would hazard a guess few site owner/users are aware of, at least as a setting that provides this activity behaviour. Do users expect to be able to hide blogs? would they if it were a plain vanilla WP/MS install? This is a perceived requirement that BP introduces yet we don't explicitly offer the option, we don't say "check this box it will remove blog activity from the stream"

If it's mooted that it is not the best approach why is it being used, why indeed is there the notion of privacy attempted when it's based on a setting that I would hazard a guess few site owner/users are aware of

I've always understood the logic thus: If your blog posts are sent to the network activity feed, they will be indexed by Google (assuming that your BP_ROOT_BLOG is open to crawlers). So, if you have actively marked your blog as no-robots, then it follows that you wouldn't want your content crawled on the activity stream either.

I agree that it's perhaps not the ideal setup, but simply reverting it could be even worse. Right now, the worst that happens is that some activity doesn't get recorded - annoying. If we ignored the blog_public setting, on the other hand, the worst that could happen is that people would unwittingly have what they thought were "private" blog entries showing up in search engines. This, IMO, is more than annoying: it's a violation.

we don't explicitly offer the option, we don't say "check this box it will remove blog activity from the stream"

At a very minimum, we should filter the text on Settings > Privacy (or supercede it - I can't remember how filterable it is) to say "Please note that this setting will prevent your blog posts from appearing in the sitewide activity stream." Or perhaps we could add a whole other section to the Settings > Privacy screen, along the lines of:

Activity Settings
[x] I would like posts and comments from my blog to appear in the sitewide activity stream
[x] I would not like....

I've always understood the logic thus: If your blog posts are sent to the network activity feed, they will be indexed by Google (assuming that your BP_ROOT_BLOG is open to crawlers). So, if you have actively marked your blog as no-robots, then it follows that you wouldn't want your content crawled on the activity stream either.

My problem here would be "your blog " it seems from the bp function that we aren't actually checking the blog options value so can't differentiate between blogs, we simply place a blanket dictate that all sub blogs won't be able to feed into the root blog activity feed, perhaps one blog does want to yet another not?

<snip>If we ignored the blog_public setting, on the other hand, the worst that could happen is that people would unwittingly have what they thought were "private" blog entries showing up in search engines

The danger here is two fold 'blog_public' might suggest something other than what is really is, a instruction that is not mandatory to search engines, and having people even for a minute assume that what was happening here was setting a privacy level is in itself dangerous - nothing about this setting has anything, truly, to do with privacy, any user assuming it has is technically being misled; even if BP prevent the activity from being recorded that does not actually prevent search engines accessing that content from some other avenue and it will be a case that somewhere that content will have been indexed unless one fancies gathering a list of the bad bots and adding them to ones htaccess file. However I realise that in some manner if the blog_public setting is false then somewhere somehow we do need to pay heed to that setting.

At a very minimum, we should filter the text on Settings > Privacy (or supercede it - I can't remember how filterable it is)

It does appear filterable or at least a hook is provided 'blog_privacy_selector' which oddly? changes the nature of the settings block if hooked into - not sure why but does let one pass some additional text:

That would be best run from bp-blog-functions I guess and based on what state the blog is set as e.g. flip the message 'site visibility is set to discourage, BP is not adding blogs to activity feed' / 'enabling this setting will remove blogs from activity feed.

Looking further at the functions in bp-blogs-functions.php I do see functions that take a param to set a blog as 'not tracked' but feel the $blog_id ought to be checking blog_option($blog_id, 'blog_public') then it's a viable check for individual blog settings options.

it seems from the bp function that we aren't actually checking the blog options value so can't differentiate between blogs, we simply place a blanket dictate that all sub blogs won't be able to feed into the root blog activity feed

Got ya, like I said needed to read all of the functions in more detail. ignore the latter wafflings as they are a digression. Issue is as originally outlined albeit not easily dealt with, however the settings are hookable so would be possible to at least add a little extra text that may help users understand what the extended consequences of search engine visibility are or even through the do_action add a BP setting for specifically setting a activity visible option.