On November 5th, 2012 at roughly 6pm PST, Google’s services went offline. A lot of companies rely on their Gmail, Gdocs and Gdrives to exchange work-related content. Not having access to these services are a big deal, but, understanding some of the pitfalls of the internet may explain how this can happen to Google.
We’ll keep it simple; the internet is held together by a collection of networks known as “Autonomous Systems” (AS), with every network having it’s own AS number.
That being said, AS numbers (or networks) are connected by a patching system entitled “Border Gateway Protocol”. The BGP, as it is known, tells which IP address belong to what network and routes AS numbers accordingly.
Routes are pathways from one IP to another. AS simply establishes the connection. That’s cool, but still, the question remains: why does this stuff matter? BGP relies heavily on trust. In other words, If one company says they are behind a specific IP address no one questions it. They are simply routed across a specific pathway on the network.
The trouble starts when particular IPs state they are behind a given address when they really are not. Sometimes it’s a malicious act and other times it’s not. In Google’s case, information behind IP routes were misrouted or “leaked”, or simply not announced correctly by the BGP system. Still with me? Okay, here we go:
Sometimes, malicious reasons or not, IP address can be “leaked” outside their normal paths. More likely it was probably an honest mistake and a lesson learned that failings of the BGP system do tend to happen from time to time. So what’s the solution? Most network engineers have established relationships with each each other, therefore they can communicate with each other when route leakage occurs.
Chances are, once a network becomes un-responsive, an engineer is on the phone to notify identified networks to stop announcing routes they don’t actually represent. As soon as that happens, the BGP can successfully route IP address to the correct place. Although this may all seem moot, an update from google stated it was simple a hardware failure. Go figure, but route leakage does and will happen again. The internet’s a wild place isn’t it?
For more information contact James Mulvey