logswan

Fast Web log analyzer using probabilistic data structures
Log | Files | Refs | README | LICENSE

README.md (5738B)


      1 ```
      2                                 _____
      3                             .xiX*****Xix.
      4                           .X7'        '4Xk,
      5                          dXl            'XX.        .
      6                         xXXl             XXl        .
      7                         4XXX             XX'
      8                        .  ,x            iX'   _,,xxii
      9                        |   ²|        ,iX7,xiiXXXXXXXl
     10                        |          .xi,xiXXXXXXXXXXXX:
     11                        .      ..iXXiXXXXXXXXXXXXXXX7.
     12                        .    .xXXXXXXXXXXXXXXX'XXXX7 .
     13                        |   ,XXXXXXXXXXXXXXXX'XXX7'  |
     14                        :  .XXXXX7*'"' 2XXX7'XX7'    |
     15   __/ \     _____    ____  \XX' _____  47'  ___  ___      _____     __
     16 .\\_   \___/  _  \__/  _/_______\  _/______/  /  \  \____/  _  \___/  \  _____
     17 . /     __    Y _ __   \__  _________  _____  \/\/   ____ _ _   ______ \/ __///
     18 :/       /    |    \    |'   \/   \/    \/            \/    Y    \/   \    \  :
     19 |\______/\_________/____|    /\____     /\_____/\_____/\____|____/\____\___/  |
     20 +--------------------- \____/ --- \____/ ----:----------------------h7/dS!----+
     21                        .                     |      :
     22                        : .                   :      |
     23                        | .     Logswan       .      |
     24                        | :                       .  |
     25                        |_|_______________________|__|
     26                          |                       :
     27                                                  .
     28 ```
     29 # Logswan
     30 
     31 Logswan is a fast Web log analyzer using probabilistic data structures. It is
     32 targeted at very large log files, typically APIs logs. It has constant memory
     33 usage regardless of the log file size, and takes approximatively 4MB of RAM.
     34 
     35 Unique visitors counting is performed using two HyperLogLog counters (one for
     36 IPv4, and another one for IPv6), providing a relative accuracy of 0.10%.
     37 String representations of IP addresses are used and preferred as they offer
     38 better precision.
     39 
     40 Project design goals include: speed, memory-usage efficiency, and keeping the
     41 code as simple as possible.
     42 
     43 Logswan is **opinionated software**:
     44 
     45 - It only supports the Common Log Format, in order to keep the parsing code
     46   simple. It can of course process the Combined Log Format as well (referer
     47   and user agent fields will be discarded)
     48 - It does not split results per day, but log files can be split prior to
     49   being processed
     50 - Input file size and bandwidth usage are reported in bytes, there are no
     51   plans to format or round them
     52 
     53 Logswan is written with security in mind and is running sandboxed on OpenBSD
     54 (using pledge). Experimental seccomp support is available for selected
     55 architectures and can be enabled by setting the `ENABLE_SECCOMP` variable
     56 to `1` when invoking CMake. It has also been extensively fuzzed using AFL
     57 and Honggfuzz.
     58 
     59 ## Features
     60 
     61 Currently implemented features:
     62 
     63 - Counting used bandwidth
     64 - Counting number of processed lines / invalid lines
     65 - Counting number of hits (IPv4 and IPv6 hits)
     66 - Counting visits (unique IP addresses for both IPv4 and IPv6)
     67 - GeoIP lookups (for both IPv4 and IPv6)
     68 - Hourly hits distribution
     69 - HTTP method distribution
     70 - HTTP protocol distribution
     71 - HTTP status codes distribution
     72 
     73 ## Dependencies
     74 
     75 Logswan uses the `CMake` build system and requires `Jansson` and `libmaxminddb`
     76 libraries and header files.
     77 
     78 ## Installing dependencies
     79 
     80 - OpenBSD: `pkg_add -r cmake jansson libmaxminddb`
     81 - NetBSD: `pkgin in cmake jansson libmaxminddb`
     82 - FreeBSD: `pkg install cmake jansson libmaxminddb`
     83 - macOS: `brew install cmake jansson libmaxminddb`
     84 - Alpine Linux: `apk add cmake gcc make musl-dev jansson-dev libmaxminddb-dev`
     85 - Debian / Ubuntu: `apt-get install build-essential cmake libjansson-dev libmaxminddb-dev`
     86 - Fedora: `dnf install cmake gcc make jansson-devel libmaxminddb-devel`
     87 
     88 ## Building
     89 
     90 	mkdir build
     91 	cd build
     92 	cmake ..
     93 	make
     94 
     95 Logswan has been successfully built and tested on OpenBSD, NetBSD, FreeBSD,
     96 macOS, and Linux with both Clang and GCC.
     97 
     98 ## Packages
     99 
    100 Logswan packages are available for:
    101 
    102 - [OpenBSD][1]
    103 - [NetBSD][2]
    104 - [FreeBSD][3]
    105 - [Debian][4]
    106 - [Ubuntu][5]
    107 - [Void Linux][6]
    108 - [Homebrew][7]
    109 
    110 ### GeoIP2 databases
    111 
    112 Logswan looks for GeoIP2 databases in `${CMAKE_INSTALL_PREFIX}/share/dbip` by
    113 default, which points to `/usr/local/share/dbip`.
    114 
    115 A custom directory can be set using the `GEOIP2DIR` variable when invoking
    116 CMake:
    117 
    118 	cmake -DGEOIP2DIR=/var/db/dbip .
    119 
    120 The free Creative Commons licensed DB-IP IP to Country Lite database can be
    121 downloaded [here][8].
    122 
    123 Alternatively, GeoLite2 Country database from MaxMind can be downloaded free
    124 of charge [here][9], but require accepting an EULA and is not freely licensed.
    125 
    126 ## Usage
    127 
    128 	logswan [-ghv] [-d db] logfile
    129 
    130 If file is a single dash (`-'), logswan reads from the standard input.
    131 
    132 The options are as follows:
    133 
    134 	-d db	Specify path to a GeoIP database.
    135 	-g	Enable GeoIP lookups.
    136 	-h	Display usage.
    137 	-v	Display version.
    138 
    139 Logswan outputs JSON data to **stdout**.
    140 
    141 ## License
    142 
    143 Logswan is released under the BSD 2-Clause license. See `LICENSE` file for
    144 details.
    145 
    146 ## Author
    147 
    148 Logswan is developed by Frederic Cambus.
    149 
    150 - Site: https://www.cambus.net
    151 
    152 ## Resources
    153 
    154 Project homepage: https://www.logswan.org
    155 
    156 GitHub: https://github.com/fcambus/logswan
    157 
    158 [1]: https://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/www/logswan
    159 [2]: https://pkgsrc.se/www/logswan
    160 [3]: https://www.freshports.org/www/logswan
    161 [4]: https://packages.debian.org/search?keywords=logswan
    162 [5]: https://packages.ubuntu.com/search?keywords=logswan
    163 [6]: https://github.com/void-linux/void-packages/tree/master/srcpkgs/logswan
    164 [7]: https://formulae.brew.sh/formula/logswan
    165 [8]: https://db-ip.com/db/lite.php
    166 [9]: https://dev.maxmind.com/geoip/geoip2/geolite2/