Data breach at Cloudflare after failed software update

Data breach at Cloudflare after failed software update

Cloudflare admits that a faulty software update corrupted its logging-as-a-service service, resulting in the loss of customer data.

This writes The Register. The network management company acknowledges in a notice that its Cloudflare Logs service failed to send collected data to customers for about 3.5 hours on Nov. 14 and that about 55 percent of the logs were lost.

Cloudflare Logs collects logs generated by cloud services and sends them to customers who want to analyze them. Cloudflare suggests that these logs can be useful “for debugging, identifying configuration adjustments, and creating analytics, especially when combined with logs from other sources, such as your application server.”

Cloudflare customers often want logs from multiple servers, and since log files can be extensive and voluminous, the provider fears that processing all this data can become overwhelming.

Unworkable number of transactions

“Imagine the postal service ringing your doorbell once for each letter instead of once for each packet of letters,” Cloudflare states in its message. “With thousands or millions of letters each second, the number of separate transactions that would entail becomes prohibitive.”

That’s why Cloudflare uses a tool called Logpush to aggregate logs into predictable packets, which are then sent to customers regularly. Other tools, Logfwdr and Logreceiver, prepare the logs Cloudflare delivers to customers.

Change to Logpush

On Nov. 14, Cloudflare made a change to Logpush intended to support an additional dataset.

It was a flawed change – it “essentially informed Logfwdr that no customers had logs configured to be pushed.” Cloudflare staff noticed the problem and reversed the change within five minutes.

However, the incident triggered another bug in Logfwdr, which, in situations like the Logpush error, stopped all log events for all clients in the system instead of only for clients that had a Logpush task set up. The resulting abundance of data caused the failure and the loss of some log files.

Cloudflare vindicated itself for the incident. It admitted that most of the work to prevent this kind of problem had already been done but had not been fully completed. In its message, it compares the situation to forgetting to fasten a seatbelt. The safety systems are in place and working, but they are useless if not used.

Automated alerts

The network giant wants to avoid such mistakes in the future. And it does so with automated alerts. Then, misconfigurations would be “impossible to miss.” These are brave words, states The Register. The company is also planning additional tests to prepare for the impact of data center and/or network failures and system overloads.