1. Performance

It is very preliminary but I setup a test using some Nginx logs. The logs were split into three files each containing 1,002,646 log lines. The test consists of parsing the Nginx logs, converting them to a Heka protobuf stream, and writing them back to disk. The configurations were setup to be as equivalent as posssible (single output, similar flush intervals, and individual readers for each file in the concurrent test). The configs can be found here: https://github.com/mozilla-services/hindsight/tree/master/benchmarks.

1.1. Test Hardware

Lenovo x230 Thinkpad
Memory 8GB
Processor Intel Core i7-3520M CPU @ 2.90GHz x 4
Disk SSD

1.2. Test 1 - Processing a single log file

1.2.1. Hindsight (0.8)

RSS: 3276
VIRT: 253148
SHR: 1140
Processing time: 9.88 seconds
Log Lines/sec: 101482

1.2.2. Heka (0.9)

RSS: 39764
VIRT: 662204
SHR: 5240
Processing time: 63 seconds
Log Lines/sec: 15,915

1.2.3. Summary

Approximately 12x less resident memory
Approximately 2.5 less virtual memory
Over 6x the throughput
Hindsight (kill -9) Caused some corruption in the stream but no messages were lost and 3 were duplicated.
- Error unmarshalling message at offset: 275181251 error: proto: field/encoding mismatch: wrong type for field
- Corruption detected at offset: 275181251 bytes: 317
- Processed: 1002650, matched: 1002649 messages
Heka (kill -9) record count: 1,002,595 so at least 51 records were lost.

1.3. Test 2 - Processing all three log files concurrently

1.3.1. Hindsight (0.8)

RSS: 5488
VIRT: 327872
SHR: 1128
Processing time: 17.84 seconds
Log Lines/sec: 168,606

1.3.2. Heka (0.9)

RSS: 42060
VIRT: 801432
SHR: 5256
Processing time: 198 second
Log Lines/sec: 15,192 # actually slower than processing them sequentially

1.3.3. Summary

Approximately 7.7x less resident memory
Approximately 2.4x less virtual memory
Over 11x the throughput

Performance