Skip to content

Latest commit

 

History

History
230 lines (191 loc) · 10.1 KB

README.md

File metadata and controls

230 lines (191 loc) · 10.1 KB

phpMWSLP - php multi-core webserver log parser

About program

The specific of the program is that it can process the web server log file in multi-threaded (multi-core) mode.
How it works:
A large log file is cutting by fragments and processed in several threads (you can specify a number of processor core for a thread).
Data is stored in a SQLite database based on JSON strings.
The program creates a record in the SQLite database for the specified date and updates it as the threads are finished.

For now the program has following modules:

  • module_browsers_names // Browsers names
  • module_browsers_versions // Top 100 browsers versions
  • module_ip_top_100 // Top 100 IP addresses
  • module_ip_unique_count // Total number of unique IP addresses
  • module_display_resolutions // Screen resolutions
  • module_device_type // Device type
  • module_windows_version // Windows versions
  • module_os_info // Information about operating systems
  • module_iphone_version // iPhone version
  • module_ipad_version // iPad version
  • module_macintosh_version // iMac version
  • module_android_version // Android version
  • module_social_networks // Transitions from social networks
  • module_search_engines // Transitions from search engines
  • module_all_referal_links // All referral links
  • module_all_requests // All requests to the web server
  • module_all_referal_links_exclude_known // All referrals from sources excluding known ones
  • module_search_engines_google // Transitions from google
  • module_status_404_requests // All requests with status 404
  • module_status_200_requests // All requests with status 200
  • module_10min_online_users_count // Number of online every 10 minutes
  • module_day_online_users_count // Number of online users between selected dates
  • module_cities // Cities statistics
  • module_countriesISO // Countries code statistics
  • module_countries // Countries statistics

The result can be viewed with javascript C3 library:

module_day_online_users_count:

module_windows_versions:

module_device_type, module_10min_online_users_count:

Configuration


First of all, you need to check the web server log configuration.
For Nginx, set the following log format:

log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';

When you access the site, you should see an entry like this (in your web server log file):

192.168.1.2 - - [11/Oct/2023:03:34:02 +0300] "GET https%3A//192.168.1.1/test.html HT
TP/1.0" 200 49 "https://192.168.1.1/test.html" "Mozilla/5.0 (Linux; arm_64; Android 13; 21081111RG)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 YaBrowser/23.7.5.95.00 SA/3 Mobile Safari/53
7.36"  "192.168.1.2"

For collection more information like display resolutions, referal links e.t.c. you should put code below to pages on your web site:

<script type="text/javascript">
document.write("<img src='https://your-site-name/stat?s=1;x"+
escape(document.referrer)+
((typeof(screen)=="undefined")?"":";x"+
screen.width+"x"+screen.height+"x"+
(screen.colorDepth?screen.colorDepth:screen.pixelDepth))+
";x"+escape(document.URL)+";x"+escape(navigator.platform)+
";x"+escape(navigator.userAgent)' alt='' border=0 width=0 height=0><\/a>")
</script>

Create empty stat file in your site root directory.

Now that the logs are done, you can setup the program:

  1. Edit the file app/config/config.php:

    • Set the path to the root directory (app) PHP_MWSLP_ROOT
    • Set the name of the SQLite database PHP_MWSLP_PDO_TABLENAME
    • Set the path to the GeoIP database GeoLite2-City.mmdb PHP_MWSLP_GEO_DB
    • Set the URL to the public folder PHP_MWSLP_HTTP

    Additionally, you can enable the following variables:

    • Show module output (false/true) PHP_MWSLP_SHOW_MODULE_OUTPUT
    • Show SQL query (false/true) PHP_MWSLP_SHOW_SQL_QUERY
    • Show line counter (false/true) PHP_MWSLP_SHOW_LINE_COUNTER
    • URL length for output PHP_MWSLP_URL_LENGTH
    • Read the log file from gz archive (false/true) PHP_MWSLP_LOG_IS_GZIP
    • Show horizontal line on the chart (false/true) PHP_MWSLP_CHART_YGRID_LINE
  2. Edit the file app/scripts/config:

  • Set the date for which the log will be processed:
    log_date=01.05.2024
  • Set the directory where the web server logs are stored:
    log_path=/dist/STAT/01.24
  • Set the name (prefix) of the first part of the log files:
    log_prefix=access.log
    (the /dist/STAT/01.24/ folder should contain web server log files, for example:
    access.log-20240101.gz
    access.log-20240102.gz
    access.log-20240103.gz
    access.log-20240104.gz)
  • Set the folder where manipulations with logs will be made:
    log_tmp=/var/www/html/git/web_stat_tmp/app/tmp
  • Set the folder where the web server log files will be cut into parts for multi-threaded processing:
    log_tmp_parts=/var/www/html/git/web_stat_tmp/app/tmp/parts

Now you can create a database for storage log data:
Go to the /app/install/ folder and run app/install/create_db.php (for example, run: /usr/bin/php -q app/install/create_db.php)
In the app/sql/ folder you should have a SQLite database.

Now you can try to create first entry.

Steps to launch the program:

  1. Running the script app/scripts/1_log_glue.sh (Merge the logs for today, yesterday and tomorrow)

  2. Running the script app/scripts/2_grep_date.sh (Collects logs on the specified date)

  3. Running the script app/scripts/3_split_file.sh (Cutting the log file by fragments for multi-threaded processing)

  4. Run the file app/scripts/4_first.sh (This step will create the first record in the SQLite database, which will be updated by processing threads)

  5. Run the file app/scripts/5_parse_core.sh (this file will launch 3 background threads (0, 1, 2) processing the log file. These threads will process modules from the app/modules/ folder:

    • The "0" thread will collect the following information:
      (modules_core0)

      • module_browsers_names // Browsers names
      • module_browsers_versions // Top 100 browsers versions
      • module_ip_top_100 // Top 100 IP addresses
      • module_ip_unique_count // Total number of unique IP addresses
      • module_display_resolutions // Screen resolutions
      • module_device_type // Device type
      • module_windows_version // Windows versions
      • module_os_info // Information about operating systems
      • module_iphone_version // iPhone version
      • module_ipad_version // iPad version
    • The "1" thread will collect the following information:
      (modules_core1)

      • module_social_networks // Transitions from social networks
      • module_search_engines // Transitions from search engines
      • module_all_referal_links // All referral links
      • module_all_requests // All requests to the web server
      • module_all_referal_links_exclude_known // All referrals from sources excluding known ones
      • module_search_engines_google //All requests from google
      • module_status_404_requests // All requests with status 404
      • module_status_200_requests // All requests with status 200
    • The "2" thread will collect the following information:
      (modules_core2)

      • module_10min_online_users_count // number of online user every 10 minutes
    • The "3" thread will collect the following information:
      (modules_core3)

      • module_cities // Cities statistics
      • module_countriesISO // Countries code statistics
      • module_countries // Countries statistics

    The module described above is excluded from processing by default, since it is the most resource-intensive and is processed by a separate multi-threaded (multi-processor) script:

    1. Run the file app/scripts/parse_to_file.sh (this script will launch 4 background processing threads for collecting Cities and Countries)

    2. Make sure that the previous threads was completed (for example, using the command ps -aux | grep parse_to_file.sh) and then run the script app/scripts/merge.sh (this script will merge the results of 4 previous threads and add data to the database)

Some other useful scripts from folder app/scripts/:

For multiple date parsing you can use app/scripts/multiple_parse.sh

Instead of running the app/scripts/5_parse_core.sh script, you can run each thread separately:

  • parse_core0.sh
  • parse_core1.sh
  • parse_core2.sh
  • parse_core3.sh
Also, instead of running app/scripts/parse_to_file, you can run each thread separately:
  • parse_to_file_core0.sh
  • parse_to_file_core1.sh
  • parse_to_file_core2.sh
  • parse_to_file_core3.sh

Requirements

  • PHP >= 7.3 (8 is recommended)
  • PDO
  • SQLIte
  • MaxMind DB Reader
  • MaxMind Web Service Clients
  • GeoIP2 PHP API

License

MIT License