dimitris kalamaras

math, social network analysis, web dev, free software…

An apache monitoring script

One of the daily routines administrators do is to monitor their web server logs for various interesting things: response codes (i.e. 500), attempts to access restricted pages, user-agents, ips, popular pages, even image hotlinking from other sites. In Linux servers, this can be easily done through shell one-liners involving various tools (tail, awk, sort, cut etc). Here is a bash script I use which automates apache monitoring for a given website.

The script is pretty straightforward and simple to understand. All you have to do is to edit the first 6 variables to your needs (domain,host, apache log file, your email etc) and then make it executable and run it. It will output the following statistics in more pages, and send them to your email as well:

  • Total entries (hits) in the given log file
  • Top-10 user-agents in last 600.000 hits
  • Top-10 IPs in last 600.000 hits
  • Response codes breakdown in last 600.000 hits
  • Top-10 URLs with 500 response code
  • Last 10 URLs with 500 response code
  • Top-10 URLs with 400 (bad request) response code
  • Last 10 URLs with 400 (bad request) response code
  • Top-10 URLs with 401 (unauthorized) response code
  • Last 10 URLs with 401 (unauthorized) response code
  • Top-10 URLs with 403 (forbidden) response code
  • Last 10 URLs with 403 (forbidden) response code
  • Last 10 URLs with 404 (not found) response code
  • Top-10 most popular URLs in last 600.000 hits
  • Top-10 sites hotlinking your site’s images in last 600.000 hits

Note that your apache logs should be in the default log format or else the grep/awk commands will fail!

Here is the script. Copy/paste it into a file, change the MY* and ADMIN_EMAIL variables, and run it to see a comprehensive report of your website traffic:

#!/bin/bash
MYFQDN="www.linuxinsider.gr";
MYTITLE="linuxinsider";
MYHOST="www";
MYTLD="gr";
MY_APACHE_LOG="/var/log/apache2/linuxinsider-access.log";
ADMIN_EMAIL="[email protected]"
DATESTAMP=`date +"%d-%m-%y"`;
CHECK_LOG=/root/${MYFQDN}-apache-access-log-${DATESTAMP}.log
 
echo "APACHE LOG STATISTICS FOR " ${DATESTAMP} > ${CHECK_LOG};
 
FIRST_ENTRY=`head -1 ${MY_APACHE_LOG} | cut -d\" -f1 | cut -d[ -f2 | cut -d] -f1`
echo "FIRST APACHE LOG ENTRY/HIT AT: " $FIRST_ENTRY >> ${CHECK_LOG} 
echo " "  >> ${CHECK_LOG}
 
TOTAL_ENTRIES=`cat ${MY_APACHE_LOG} | wc -l`;
echo "TOTAL APACHE LOG ENTRIES SINCE THEN: " $TOTAL_ENTRIES >> ${CHECK_LOG} 
echo -n "...10%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 USER AGENTS IN LAST 600.000 HITS: "  >> ${CHECK_LOG} ;
tail -600000 ${MY_APACHE_LOG} | cut -d\" -f6 | sort | uniq -i -d  -c | sort -rh | head -10   >> ${CHECK_LOG}
echo -n "...20%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 IPs IN LAST 600.000 HITS:"  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} |awk '{print $1}' |sort | uniq -c | sort -rn | head -10  >> ${CHECK_LOG};
echo -n "...30%...";
echo " "  >> ${CHECK_LOG}
 
echo "RESPONSES IN LAST 600.000 HITS: "  >> ${CHECK_LOG};
tail -600000  ${MY_APACHE_LOG} |  awk '{print $9}'  | sort | uniq -c  >> ${CHECK_LOG};
echo -n "...30%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 URLs WITH 500 RESPONSE CODE: "  >> ${CHECK_LOG};
awk '($9 ~ /500/)' ${MY_APACHE_LOG}  | awk '{print $7}' | grep -v xcf | grep -v xce  | sort | uniq -c | sort -rn | head -10  >> ${CHECK_LOG};
echo -n "...40%...";
echo " "  >> ${CHECK_LOG}
 
echo "LAST 10 URLs WITH 500 RESPONSE CODE: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk '($9 ~ /500/)' | awk -F\" '{print $1 $2   }' | head -10   >> ${CHECK_LOG};
echo -n "...50%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 URLs WITH 400 (bad request) RESPONSE CODE: "  >> ${CHECK_LOG};
awk '($9 ~ /400/)' ${MY_APACHE_LOG}  | awk '{print $7}' | grep -v xcf | grep -v xce  | sort | uniq -c | sort -rn | head -10  >> ${CHECK_LOG};
echo -n "...55%...";
echo " "  >> ${CHECK_LOG}
 
echo "LAST 10 URLs WITH 400 (bad request) RESPONSE CODE: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk '($9 ~ /400/)' | awk -F\" '{print $1 $2   }' | head -10  >> ${CHECK_LOG};
echo -n "...60%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 URLs WITH 401 (unauthorized) RESPONSE CODE: "  >> ${CHECK_LOG};
awk '($9 ~ /401/)' ${MY_APACHE_LOG}  | awk '{print $7}' | grep -v xcf | grep -v xce  | sort | uniq -c | sort -rn | head -10  >> ${CHECK_LOG};
echo -n "...65%...";
echo " "  >> ${CHECK_LOG}
 
echo "LAST 10 URLs WITH 401 (unauthorized) RESPONSE CODE: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk '($9 ~ /401/)' | awk -F\" '{print $1 $2   }' | head -10  >> ${CHECK_LOG};
echo -n "...70%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 URLs WITH 403 (forbidden) RESPONSE CODE: "  >> ${CHECK_LOG};
awk '($9 ~ /403/)' ${MY_APACHE_LOG}  | awk '{print $7}' | grep -v xcf | grep -v xce  | sort | uniq -c | sort -rn | head -10  >> ${CHECK_LOG};
echo -n "...75%...";
echo " "  >> ${CHECK_LOG}
 
echo "LAST 10 URLs WITH 403 (forbidden) RESPONSE CODE: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk '($9 ~ /403/)' | awk -F\" '{print $1 $2   }' | head -10  >> ${CHECK_LOG};
echo -n "...80%...";
echo " "  >> ${CHECK_LOG}
 
echo "LAST 10 URLs WITH 404 (not found) RESPONSE CODE: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk '($9 ~ /404/)' | awk -F\" '{print $1 $2   }' | head -10  >> ${CHECK_LOG};
echo -n "...90%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 POPULAR URLs IN LAST 600.000 HITS: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk -F\" '{print $2}' | sort | uniq -c  | sort -rn |  head -10  >> ${CHECK_LOG};
echo -n "...100%...";
echo " "  >> ${CHECK_LOG}
 
echo "TOP-10 HOTLINKING OUR IMAGES IN LAST 600.000 HITS: "  >> ${CHECK_LOG};
tail -600000 ${MY_APACHE_LOG} | awk -F\" '($2 ~ /\.(jpg|gif|png|jpeg)/ && $4 !~ /^http:\/\/'${MYHOST}'\.'${MYTITLE}'\.'${MYTLD}'/){print $4}'  | sort | uniq -c | sort  | head -10  >> ${CHECK_LOG}
echo  "...Ready!...";
 
echo " Sending email to webmaster";
mail  -s "${MYFQDN}: Apache Logs Report for ${DATESTAMP}" "${ADMIN_EMAIL}" < ${CHECK_LOG}
 
more ${CHECK_LOG};

Tip: You might have separate versions of the script for each of your websites executed in a daily cronjob to have a daily report for all of them.

Have fun!

Previous

SocNetV and online social networks

Next

SocNetV v1.6 released – a nice web crawler included!

2 Comments

  1. Akis Foulidis

    Try using the ELK stack and you will be amazed on what info you can extract from your logs, especially if you want to use it for other departments (except admins/devs).

    It’s almost mandatory nowadays.

  2. Renan

    It worked very well!
    Thanks.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress & Theme by Anders Norén