Privacy in statistical web traffic reporting
I automatically generate daily statistical reports for my web sites traffic using Awstats. Awed by Awstats extensive reporting capabilities I enabled everything with full details and let it run like that. Erik, one of my favorite contradictors, found that I may have gone a bit too far on that one. Of course I first dismissed that as one of his usual privacy rants – we both have very different ideas of how much personal information we should let the public know about us. But a quick costs/benefits analysis showed that for once we actually had some common ground.
First he mentioned that my reports were indexed by search engines. I was aware of that but I saw no wrong about it and did not even bother adding a robots exclusion pattern. But having the statistical reports indexed brought no one any significant value : all users had other ways to access them through links. So the benefit was zero. In addition, the indexation of pages containing referer links promotes referer spam – and everyone know how much I love to hate spammers. The costs/benefits analysis provided a clear conclusion and the corresponding robots.txt was therefore swiftly added.
Then Erik mentioned the presence of IP addresses in the Awstats reports. I had never given any thought about those, but the privacy breach was obvious : ill intentioned organizations could easily track the users who indulge in a visit to my hall of deviant ramblings. My first reaction was to consider that whoever wants to hide can use an anonymous proxy or a Tor onion routing gateway. But Erik made me realize that we are dealing with the clueless masses. And as plentypotent semi-divinities with root access we have a duty to protect them from their own lack of clues.
Moreover it occurred to me that this report is not very useful. I need the IP addresses as raw material to generate about every piece of statistical data, but that can very well be done done anonymously. The only redeeming value of the section of the report containing IP addresses is letting me know if a handful of hosts are actually generating all the traffic. The value of this information strongly decreases as traffic reaches statistically significant numbers. So once again the costs/benefits analysis provides an easy conclusion : letting the hosts report go would not be too painful either. Ideally I would keep it in an anonymous form. But that would require modifying Awstats and I am not going to allocate resources to that today. So for now I am just going to tell Awstats to skip it.
So here we go :
cd /etc/awstats
perl -i -pe 's/ShowHostsStats=PHBL/ShowHostsStats=0/g' *.conf
That’s all folks ! I now just have to force regeneration of all my web traffic reports. Good thing that all that is now completely automated !
To those who doubt that I can change my mind : I can readily change my mind with ease, but I require to be convinced either by myself alone or with the assistance of a third party. Let this be an example for those who lost all hope of convincing me.
Leave a Reply
You must be logged in to post a comment.