Memory Bug to Blame for Google Docs Crash Wednesday

A change to Google Docs designed to improve real-time collaboration exposed a memory bug that brought down the cloud-based system for an hour Wednesday afternoon Google engineering director Alan Warren wrote on the company's blog this morning. The bug affected users' access to the Google docs list, documents, drawings and App Scripts.

This is the second time a crash has occurred in less than two weeks. On Aug. 26, Google Docs List experienced a 404 error for longer than an hour that affected a majority of users. However, users were still able to access their documents through the shared document notification or if they had the document's direct link.  

When a Google doc user updates a document, a machine looks up the servers that need to be updated with the information, but because of the bug the machines did not recycle their memory properly after each look up, which caused them to run out of memory and restart, Alan wrote. Then while the machines were restarting, their load was transferred and picked up by the remaining machines, which caused them to run out of memory even faster. The servers could not keep up with the workload and Google Docs crashed.

"Google docs is down. I am filled with impotent rage, too bad I can't write it out," Tasty Labs co-founder Nick Nguyen tweeted during the outage.

The Google team was alerted to the issue 60 seconds after the company's automated system registered a dramatic increase in failure rates. It took Google's engineering teams 23 minutes before they started rolling back the feature change that caused the failure, a process that took  24 minutes.

Anyone who had a doc open in their browser could still copy/paste the text though they could not update it.

"Terrible habit of not backing up Google Docs was canceled out by a terrible habit of never closing a browser window," Chris Baker, senior editor at Wired tweeted.

The Google Documents application status dashboard indicates that the Google team was alerted to the issue in documents at 2:28pm  -- 10 minutes after being alerted to the issue with Google Docs list -- as the company's machines ran out of memory and restarted, dumping an increased workload on the remaining units.

It appears that as soon when Google Documents went down it took Drawings with it. Both units reconciled the issue at 3:19 p.m.


Alan Warren, Google Engineering Director
Google Plus: 


 Challenge Post - Solving problems for everyone.
 Datastax - Big Data. Big Smarts. Big Fun.
 Thumbtack - America's marketplace for local services.
 RecoEngine - Tech Buying. Made Simple.
 Rapportive - Let's make email a better place