As the developers of Open Journal Systems, Open Conference Systems, Open Harvester Systems, and Open Monograph Press, the PKP team are experts in helping journal managers and conference organizers make the most of their online publishing projects. PKP Publishing Services offers support for:

As a customer of PKP Publishing Services, you will not only receive direct, personalized support from the PKP Development Team, but will be contributing to the ongoing development of the PKP applications. All funds raised by PKP Publishing Services go directly toward enhancing our free, open source software. For more information, please contact us.

1. Search the forum. You can do this from the Advanced Search Page or from our Google Custom Search, which will search the entire PKP site. If you are encountering an error, we especially recommend searching the forum for said error.

2. Check the FAQ to see if your question or error has already been resolved.

3. Post a question, but please, only after trying the above two solutions. If it's a workflow or usability question you should probably post to the OJS Editorial Support and Discussion subforum; if you have a development question, try the OJS Development subforum.

I'm responsible for OJS instalation at http://forumoswiatowe.pl. First journal issue was published recently. However, the website is still invisible to various web crawlers and bots. I've done some research and was perplexed to see that some services are reporting weird errors like no meta-tags.

which suggest that the document accessed is empty. However, I can still open the website in several browsers without any problems.

I believe that website's invisibility in web search engines is caused by something blocking bots and crawlers from indexing it. So far I've checked the robots.txt file and .htaccess files in search of some errors with no success. Robots.txt disallow only /cache/ access and I have a wide list of bots excluded from the site in .htaccess file which I have also temporarily removed in order to check if it's not causing google bot or other services to stop indexing the website - it didn't.

Does anyone have a suggestion of what might be the cause of this problem?

I can't think of anything in OJS that would cause this; it may be a web server configuration/security/filtering issue. I'd suggest getting a tool or web browser plugin that allows impersonation of different user agents; in Firefox, for example, there's a "default user agent" extension that can do this.

When I paste "http://forumoswiatowe.pl/index.php/czasopismo" into W3C validator, it opens the file and displays that markup is valid (no empty file errors), but gives an empty response on "http://forumoswiatowe.pl". It looks like web crawlers and other bots somehow aren't redirected properly. Since there's only one journal on this particular OJS installation, I've enabled it to be persistent site-wide. However, since the PHP engine is doing the redirect, there may be some misconfiguration. Too bad that my knowledge about Apache configuration is too shallow to diagnose it myself. I've also tried and temporarily removed .htaccess (I have a www to non-www redirect there) to see if it isn't conflicting but it didn't fix the problem.

Perhaps one solution would be to have a solid redirect from "http://forumoswiatowe.pl/" to "http://forumoswiatowe.pl/index.php/czasopismo" but I don't know if OJS's internal redirect mechanism won't kick-in before .htaccess.

The reason for this is that "Refresh" is supported by web browsers but may not be with crawlers or software like wget or curl. Administrator also tried to access the website with wget with debugging enabled and here's the response:

Everything works well after I changed Refresh to Location. However, I wonder why other OJS installations are free of this problem? I'd like to hear from developers if it's safe to leave this line of code changed or not if it may compromise journal's security. Perhaps, there's a better way of dealing with this problem? I'm also worried about future updates to OJS. Will I have to update this file each time manually?

We used to use Location: headers but changed to the current Refresh: per http://pkp.sfu.ca/bugzilla/show_bug.cgi?id=3520. That change was quite a while ago so I'm surprised nobody has reported crawling problems so far. I'd be interested in potentially returning to Refresh: as it's more standards-friendly, but only if it doesn't cause major problems with IE. I'll see if we can do some testing with the team on this. Meanwhile, changing back to Location: won't have any other side-effects.

After two weeks since the problem was fixed on forumoswiatowe.pl Google still doesn't want to index this website, regardless of having Google Webmaster Tools set up. I've tried fetching particular sub-sites with Google and then sending them to be re-indexed and even generating a sitemap.xml. Still, the website doesn't seem to be indexed at all. In fact - when I check indexing status (advanced view) I can see that there are 32 pages that were not chosen to be indexed (green) which is strange since a journal has been already published on forumoswiatowe.pl with several articles available.

Does anyone have any suggestion of what might be the case? Could a redirect from root to index.php/czasopismo be the culprit? I have only one other redirect set up and it's www to non-www in .htaccess. However, website configuration in Google Webmaster Tools is set up on forumoswiatowe.pl, not http://www.forumoswiatowe.pl which means that indexing shouldn't even trigger the www to non-www translation.

I've checked Google Webmaster Tools and it looks like googlebot doesn't have any problems with reaching the site. Also, robots.txt aren't blocking anything else than /cache/ directory (and cache is off anyway). What worries me is that in advanced view I can see that at most 32 pages are being ignored because they are either very similar to each other or have too many redirects (or redirect to pages with similar content).

I'm worried that a basic redirect from domain root to journal might be the culprit.

Can you confirm that you're seeing this behavior even after the redirect type has been changed?

I doubt that a root redirect would cause this, as it's not unusual behavior -- but I'm not aware of Google's internal decision-making. One way you could probably test this is to disable the redirect in Site Administration; that way the homepage will include a link to your journal rather than a redirect directly to it.

I'm happy to inform that the problem with this particular OJS installation was resolved after Google admit that domain was blocked before for being a link farm. Obviously, I've made a mistake by not doing throughout investigation if forumoswiatowe.pl wasn't misused before it was ordered with intention to create an e-journal. I'm already seeing some positive feedback from Google Webmaster Tools and there are two subpages already indexed by Google, so I believe that right now it's just a matter of time for all pages to become indexed and visible in organic search.

After all, I'm happy that my investigations led to discovering a redirect problem in OJS. It's great to know that OJS is evolving.

I would also like to thank everyone for their support. I haven't seen such quick and responsive community for quite a long time.