getting hosed by a xen bug that was not caught by tcp checksums (skipped due to weird network config) nor zookeeper noticing its thread fucked off https://www.pagerduty.com/blog/the-discovery-of-apache-zookeepers-poison-packet/
the gc would peace out for minutes, observed to be caused by fragmentation https://web.archive.org/web/20121101040711/http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/
linux kernel forgets to check tcp checksum because "unnecessary" but it actually was necessary, woops https://tech.vijayp.ca/linux-kernel-bug-delivers-corrupt-tcp-ip-data-to-mesos-kubernetes-docker-containers-4986f88f7a19