Keep robots off Gallery’s CPU intensive pages
Googlebot indexing Gallery‘s slideshow.html single-handedly brought our server to a crawl. Visits to slideshow.html each produce one Apache process and one Mysql process both pegged at maximum CPU usage. And since the Googlebot visits are quite frequent the processes kept piling up as more were added before the others finished processing the queries. Loads of 30 were not uncommon – something had to be done…
On top of being very CPU intensive, slideshow.html is also totally useless since the information it provides is redundant. So the administrator has every reason to keep the robots from visiting such pages. It took me a discussion with h0bbel to build a working solution that I then recorded in Gallery’s wiki. Thank you h0bbel ! Here is how it goes :
Using URL rewrite module, the default slideshow URL is the following form: “/v/my_album/my_sub_album/my_photo.jpg/slideshow.html“. The problem is that there is no way to exclude that sort of URL in robots.txt syntax. In order to make the URL excludable, some URL rewriting is required.
Happily, there is no need for fiddling with mod_rewrite directly as the nifty rewrite module can handle the details itself. By default the “View Slideshow” rewrite target is “v/%path%/slideshow.html“. The constant slideshow URL mark (“/slideshow.html“) is on the right side of the variable path (“%path%“) and this is why we could not express the slideshow ban in robots.txt syntax. Reversing this order will provide us with an excludable URL.
So we change the rewrite target for “View Slideshow” from “v/%path%/slideshow.html” to “v/slideshow/%path%“. and then add “Disallow: /v/slideshow/” to the site’s robots.txt. If you use the PATH_INFO mode of URL rewrite module then this will be “Disallow: /main.php/v/slideshow/“.
And that’s it: no more spiders hogging our precious resources in vain !
Now in the absence of a centralized multisite administration tool I still have the chore of deploying that solution on each gallery on my host… By the way I wrote a feature request for a one-stop multisite upgrade – use your Gallery forums account to vote for it !
Leave a Reply
You must be logged in to post a comment.