Architecture
Hindsight consists of three types of plugins: Input, Analysis, and Output. All plugins are written in Lua and based on the Lua Sandbox project.
1. Input Plugins
Input plugins are used to transform external data formats into a Heka messages for processing. Each input plugin runs on a dedicated thread and all messages generated by the plugins are multiplexed to a single output stream.
2. Analysis Plugins
Analysis plugins are used for simple or complex event processing such as: aggregation, sessionization, anomaly detection etc. Analysis plugins can share a thread of execution; the work distribution across threads is user configurable and can be tailored to specific use cases and needs e.g., by performance characteristics, work load distribution or type. Every analysis thread uses a dedicated reader to process the data stream produced by the input plugins. The messages generated by all analysis plugins are multiplexed to a single output stream.
3. Output Plugins
Output plugins are used transform the internal Heka messages into whatever format is needed e.g., txt, html, tsv, alternate binary encoding. The message/transformation can then be feed to other systems such as data warehouses, information retrieval indexes, dashboards, notifiers etc. Each output plugin runs on a dedicated thread and has its own readers to process the data stream produced by the input and analysis plugins. If the output plugin is subscribed to both the input and analysis streams they are multiplexed with the oldest message being delivered first.
4. Sample High Level Data Flow
5. Reliability
5.1. Durability
The durability of the data is equal to durability of storage where the
output_path
(data stream files and state preservation files) is configured.
5.2. Delivery Guarantee
The at least once delivery guarantee only applies to software failures. There are no guarantees if the underlying storage system fails. Crashing or killing the Hindsight process should not cause any messages to be lost or skipped and will, at most, duplicate/re-process one second of data on restart. However, sandbox state is only preserved on a clean shutdown i.e., if an analysis plugin was counting the number of messages going through the system, and the system was killed, the count would be reset to last saved state. This behaviour should be taken into account for any plugins performing and preserving stateful analysis.
5.3. Dynamic Plugin Loading
The Hindsight infrastructure can be run as a service with no initial business logic loaded since all plugin types can be dynamically loaded and unloaded as necessary.
sandbox_load_path = "hs_load"
sandbox_run_path = "hs_run"
5.4. Starting a New Plugin
cp test.lua hs_load/analysis/test.lua
cp my_test.cfg hs_load/analysis/my_test.cfg
cp your_test.cfg hs_load/analysis/your_test.cfg
- Hindsight scans the directory
- test.lua is moved to hs_run/analysis/test.lua
- my_test.cfg is moved to hs_run/analysis/my_test.cfg
- Hindsight attempts to run my_test.cfg
- your_test.cfg is moved to hs_run/analysis/your_test.cfg
- Hindsight attempts to run your_test.cfg
5.5. Restarting a Plugin
cp my_test.cfg hs_load/analysis/my_test.cfg
- Hindsight scans the directory
- my_test.cfg is moved to hs_run/analysis/my_test.cfg
- Hindsight attempts to restart my_test.cfg (no data gaps/loss)
5.6. Updating the Business Logic
cp test.lua hs_load/analysis/test.lua
- Hindsight scans the directory 1. test.lua is moved to hs_run/analysis/test.lua
- Hindsight attempts to restart my_test.cfg and your_test.cfg since they use the same underlying business logic (no data gaps/loss)
5.7. Stopping a Plugin
touch hs_load/analysis/my_test.off
- Hindsight scans the directory
- hs_load/analysis/my_test.off is deleted
- hs_run/analysis/my_test.cfg is renamed to hs_run/analysis/my_test.off
- Hindsight stops my_test.cfg