(p. 42) . . . as the web kept growing, Google added more machines–by the end of 1999, there were eighty machines involved in the crawl (out of a total of almost three thousand Google computers at that time)–and the likelihood that something would break increased dramatically. Especially since Google made a point of buying what its engineers referred to as “el cheapo” equipment. Instead of commercial units that carefully processed and checked information, Google would buy discounted consumer models without built-in processes to protect the integrity of data.
As a stopgap measure, the engineers had implemented a scheme where the indexing data was stored on different hard drives. If a machine went bad, everyone’s pager would start buzzing, even if it was the middle of the night, and they’d barrel into the office immediately to stop the crawl, copy the data, and change the configuration files. “This happened every few days, and it basically stopped everything and was very painful,” says Sanjay Ghemawat, one of the DEC research wizards who had joined Google.
. . .
(p. 43) The experience led to an ambitious revamp of the way the entire Google infrastructure dealt with files. “I always had wanted to build a file system, and it was pretty clear that this was something we were going to have to do,” says Ghemawat, who led the team. Though there had previously been systems that handled information distributed over multiple files, Google’s could handle bigger data loads and was more nimble at running full speed in the face of disk crashes– which it had to be because, with Google’s philosophy of buying supercheap components, failure was the norm. “The main idea was that we wanted the file system to automate dealing with failures, and to do that, the file system would keep multiple copies and it would make new copies when some copy failed,” says Ghemawat.
Levy, Steven. In the Plex: How Google Thinks, Works, and Shapes Our Lives. New York: Simon & Schuster, 2011.
(Note: ellipses added.)