How can a commercial airliner vanish and remain lost for weeks in an age in which data on virtually everything is collected and maintained either publicly or secretly? The answer is “quite easily.” It is one thing to collect huge amounts of information, but it is another to be able to fuse it quickly and meaningfully.
Much has been made about the reluctance of countries involved in the search for Malaysia Airlines Flight 370 to share sensitive aircraft tracking data that could help locate the doomed plane. But that is not the only issue.
The inability to quickly connect the dots is largely a technical problem. The sensed data of the world aren’t inherently organized or co-located or structured to be interoperable, semantically or otherwise. Even if all the data were being shared openly, it is simply not easy to marry disparate, dispersed data stores and perform timely searches to find connections, coincidences or errant commercial aircraft.
This is essentially a situation in which the amount of data spread across the globe overwhelms the ability of any one authority to bring it all together and process it. How to deal with such vast amounts of data is a problem that is being examined by government agencies and others around the world.
Search engines like Google create the impression that all information is reachable and instantly available, but there is a tremendous enterprise that the average user doesn’t see. Behind the curtain, Google has systems to ingest, copy and analyze vast volumes of information shared on the web. The data are stored and shared using millions of Google servers, metadata are created and indexed, and the user can quickly retrieve answers to nearly any question in fractions of a second. But browsers and the data stores that source them offer access to only a slice of the digital universe.
Huge amounts of data that are collected by the world’s militaries, intelligence agencies and other government entities are not included in the data Google has access to. And these secret data stores are enormous.
To appreciate the sheer volume of data that is accumulated around the world, consider that the U.S. Navy, just one military branch of one country, collects almost 200 terabytes, or 200 trillion bytes of data, every other day to track vessels around the globe. That’s about the amount of data that would be generated if all the books in the Library of Congress were digitized.
In 2009, now-retired Air Force General David Deptula warned, “we’re going to find ourselves in the not too distant future swimming in sensors and drowning in data.” The situation of exploding volumes of information is what’s known as “the big data problem” — having a data set so vast, in such a variety of formats, and from such diverse sources that it stresses the ability to conduct a timely search and draw relevant conclusions.
The search for Flight 370 appears caught up in this big data problem. The failure to find the missing aircraft demonstrates anew the serious gaps in data coordination and challenges public assumptions about the thoroughness and simplicity of searching the world’s data for answers.
Knowing what information to collect, where to find it, how to interpret it, and when to share it could improve the chances of quickly finding one commercial airliner in a vast ocean of data.