By Marlene Cimons, National Science Foundation
In 1948, American mathematician and electronic engineer Claude Shannon published a paper that became the foundation of “information theory,’’ a concept that examines the limits of how society represents and transmits Anformation. His groundbreaking ideas opened the way for today’s trillion-dollar communications industry.
In “A Mathematical Theory of Communication,” Shannon quantified the limits of compressing, storing and transmitting data, presaging the current Information Age, with its proliferation of high speed Internet, CDs, DVDs and wireless technology. Shannon described the fundamental problem with communication as “reproducing at one point either exactly or approximately a message selected at another point.”
Information theory was big hit with communications engineers and other researchers, who have been building upon his work ever since. Recently, scientists created the Science of Information Center at Purdue University to advance science and technology through a new quantitative understanding of information processing in a wide-ranging array of systems, including biological, physical, social and engineering.
“Shannon started this communication revolution with a paper and a mathematical model, from which DVDs, the Internet and CDs came,” says Wojciech Szpankowski, director of the new center and a computer sciences professor at Purdue. “Engineers put into existence what he predicted more than 60 years ago.”
The center “is working in the same spirit in trying to build something fundamental and apply it to real life problems,’’ he adds. “To keep pace with rapid advances in networking, biology and quantum information processing, we need to rethink how we understand and integrate information. By assimilating elements of space, time, structure, semantics and context, we will deepen our understanding of information and apply these results to critical problems in society.”
The National Science Foundation supports the center, one of NSF’s Science and Technology Centers, with $25 million in funding over five years. Purdue University is the lead institution, with research partners at the Massachusetts Institute of Technology, Stanford University, the University of California at Berkeley, Princeton University, Howard University, Bryn Mawr University, the University of California at San Diego and the University of Illinois at Urbana-Champaign.
Researchers also are developing an education program, including a new course called “Science of Information,” which will introduce undergraduates to information and communication theories, and problems.
Center researchers hope to define and develop the core principles that govern information transfer, and apply this knowledge to problems in the physical and social sciences, and in engineering, including, for example, financial transactions, patterns of consumer behavior, and communication among cells within molecular biology. The results could have wide-ranging applications in numerous fields, from disease detection to developing the next generation of wireless networks.
“Information provides the essential substrate and unifying theme for virtually all complex interacting systems,” Szpankowski says. “Understanding information flow, therefore, holds the key to comprehending and building more efficient systems.”
In view of this, the center focuses its research around three major areas: life sciences, communication and the extraction of knowledge from massive datasets. Predictably, the three overlap upon occasion.
For example, scientific researchers around the world are collecting huge amounts of genomic sequences at a rate that outstrips conventional storage capacities. “We are looking at the different types of data being generated, how they are processed, and what questions are later asked about them to come up with intelligent compression schemes that will allow long term storage of this valuable but copious information stream,’’ says Tsachy Weissman, associate professor of electrical engineering at Stanford University, and a center researcher.
“More concretely, we have been studying the fundamental limits, and computationally viable schemes for approaching those limits, in compression of genomic data given highly correlated genomic data already available on a database,’’ Weissman adds. “For example, compression of one individual’s genome, given that the genome of another individual from the same species is already on the database.’’