logswan

Fast Web log analyzer using probabilistic data structures
Log | Files | Refs | README | LICENSE

README.md (5873B)


      1 ```
      2                                 _____
      3                             .xiX*****Xix.
      4                           .X7'        '4Xk,
      5                          dXl            'XX.        .
      6                         xXXl             XXl        .
      7                         4XXX             XX'
      8                        .  ,x            iX'   _,,xxii
      9                        |   ²|        ,iX7,xiiXXXXXXXl
     10                        |          .xi,xiXXXXXXXXXXXX:
     11                        .      ..iXXiXXXXXXXXXXXXXXX7.
     12                        .    .xXXXXXXXXXXXXXXX'XXXX7 .
     13                        |   ,XXXXXXXXXXXXXXXX'XXX7'  |
     14                        :  .XXXXX7*'"' 2XXX7'XX7'    |
     15   __/ \     _____    ____  \XX' _____  47'  ___  ___      _____     __
     16 .\\_   \___/  _  \__/  _/_______\  _/______/  /  \  \____/  _  \___/  \  _____
     17 . /     __    Y _ __   \__  _________  _____  \/\/   ____ _ _   ______ \/ __///
     18 :/       /    |    \    |'   \/   \/    \/            \/    Y    \/   \    \  :
     19 |\______/\_________/____|    /\____     /\_____/\_____/\____|____/\____\___/  |
     20 +--------------------- \____/ --- \____/ ----:----------------------h7/dS!----+
     21                        .                     |      :
     22                        : .                   :      |
     23                        | .     Logswan       .      |
     24                        | :                       .  |
     25                        |_|_______________________|__|
     26                          |                       :
     27                                                  .
     28 ```
     29 # Logswan
     30 
     31 [![Build Status][1]][2]
     32 
     33 Logswan is a fast Web log analyzer using probabilistic data structures. It is
     34 targeted at very large log files, typically APIs logs. It has constant memory
     35 usage regardless of the log file size, and takes approximatively 4MB of RAM.
     36 
     37 Unique visitors counting is performed using two HyperLogLog counters (one for
     38 IPv4, and another one for IPv6), providing a relative accuracy of 0.10%.
     39 String representations of IP addresses are used and preferred as they offer
     40 better precision.
     41 
     42 Project design goals include: speed, memory-usage efficiency, and keeping the
     43 code as simple as possible.
     44 
     45 Logswan is **opinionated software**:
     46 
     47 - It only supports the Common Log Format, in order to keep the parsing code
     48   simple. It can of course process the Combined Log Format as well (referer
     49   and user agent fields will be discarded)
     50 - It does not split results per day, but log files can be split prior to
     51   being processed
     52 - Input file size and bandwidth usage are reported in bytes, there are no
     53   plans to format or round them
     54 
     55 Logswan is written with security in mind and is running sandboxed on OpenBSD
     56 (using pledge). It has also been extensively fuzzed using AFL and Honggfuzz.
     57 
     58 ## Features
     59 
     60 Currently implemented features:
     61 
     62 - Counting used bandwidth
     63 - Counting number of processed lines / invalid lines
     64 - Counting number of hits (IPv4 and IPv6 hits)
     65 - Counting visits (unique IP addresses for both IPv4 and IPv6)
     66 - GeoIP lookups (for both IPv4 and IPv6)
     67 - Hourly hits distribution
     68 - HTTP method distribution
     69 - HTTP protocol (HTTP/1.0, HTTP/1.1, or HTTP/2.0) distribution
     70 - HTTP status codes distribution
     71 
     72 ## Dependencies
     73 
     74 Logswan uses the `CMake` build system and requires `Jansson` and `libmaxminddb`
     75 libraries and header files.
     76 
     77 ## Installing dependencies
     78 
     79 - OpenBSD: `pkg_add -r cmake jansson libmaxminddb`
     80 - NetBSD: `pkgin in cmake jansson libmaxminddb`
     81 - FreeBSD: `pkg install cmake jansson libmaxminddb`
     82 - Mac OS X: `brew install cmake jansson libmaxminddb`
     83 - Alpine Linux: `apk add cmake gcc make musl-dev jansson-dev libmaxminddb-dev`
     84 - Debian / Ubuntu: `apt-get install build-essential cmake libjansson-dev libmaxminddb-dev`
     85 - Fedora: `yum install cmake gcc make jansson-devel libmaxminddb-devel`
     86 
     87 ## Building
     88 
     89 	mkdir build
     90 	cd build
     91 	cmake ..
     92 	make
     93 
     94 Logswan has been successfully built and tested on OpenBSD, NetBSD, FreeBSD,
     95 Mac OS X, and Linux with both Clang and GCC.
     96 
     97 ## Packages
     98 
     99 Packages are available for the following operating systems:
    100 
    101 - [OpenBSD][3]
    102 - [NetBSD][4]
    103 - [FreeBSD][5]
    104 - [Debian][6]
    105 - [Ubuntu][7]
    106 - [Void Linux][8]
    107 
    108 ### GeoIP2 databases
    109 
    110 Logswan looks for GeoIP2 databases in `${CMAKE_INSTALL_PREFIX}/share/dbip` by
    111 default, which points to `/usr/local/share/dbip`.
    112 
    113 A custom directory can be set using the `GEOIP2DIR` variable when invoking
    114 CMake:
    115 
    116 	cmake -DGEOIP2DIR=/var/db/dbip .
    117 
    118 The free Creative Commons licensed DB-IP IP to Country Lite database can be
    119 downloaded [here][9].
    120 
    121 Alternatively, GeoLite2 Country database from MaxMind can be downloaded free
    122 of charge [here][10], but require accepting an EULA and is not freely licensed.
    123 
    124 ## Usage
    125 
    126 	logswan [-ghv] [-d db] file
    127 
    128 If file is a single dash (`-'), logswan reads from the standard input.
    129 
    130 The options are as follows:
    131 
    132 	-d db	Specify path to a GeoIP database.
    133 	-g	Enable GeoIP lookups.
    134 	-h	Display usage.
    135 	-v	Display version.
    136 
    137 Logswan outputs JSON data to **stdout**.
    138 
    139 ## Measuring Logswan memory usage
    140 
    141 Heap profiling can be done using valgrind, as follows:
    142 
    143 	valgrind --tool=massif logswan access.log
    144 	ms_print massif.out.*
    145 
    146 ## License
    147 
    148 Logswan is released under the BSD 2-Clause license. See `LICENSE` file for
    149 details.
    150 
    151 ## Author
    152 
    153 Logswan is developed by Frederic Cambus.
    154 
    155 - Site: https://www.cambus.net
    156 
    157 ## Resources
    158 
    159 Project homepage: https://www.logswan.org
    160 
    161 GitHub: https://github.com/fcambus/logswan
    162 
    163 [1]: https://api.travis-ci.org/fcambus/logswan.png?branch=master
    164 [2]: https://travis-ci.org/fcambus/logswan
    165 [3]: https://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/www/logswan
    166 [4]: https://pkgsrc.se/www/logswan
    167 [5]: https://www.freshports.org/www/logswan
    168 [6]: https://packages.debian.org/search?keywords=logswan
    169 [7]: https://packages.ubuntu.com/search?keywords=logswan
    170 [8]: https://github.com/void-linux/void-packages/tree/master/srcpkgs/logswan
    171 [9]: https://db-ip.com/db/lite.php
    172 [10]: https://dev.maxmind.com/geoip/geoip2/geolite2/