Quantitative Approach to Information Processing

Center is defining core principles of multi-disciplinary information transfer.


“More concretely, we have been studying the fundamental limits, and computationally viable schemes for approaching those limits, in compression of genomic data given highly correlated genomic data already available on a database,’’ Weissman adds. “For example, compression of one individual’s genome, given that the genome of another individual from the same species is already on the database.’’

Biology, especially, “is at a cross-road, and a better understanding of the field might come from understanding the information flow among cells,” Szpankowski says. “We haven’t designed biological systems--it’s already been built by natural selection--but we should be able to understand how information is passed from one cell to another cell. In this sense, information flow involves a much larger world than many people realize.”

Molecular biologists collect vast amounts of data, but aren’t always able to extract the information they need to confirm a hypothesis about how cells behave.

“These databases contain valuable information about how cells work and how diseases develop, but it is hard to find these needles in the haystack with current computational and analysis methods,” Szpankowski says. “We need the next level of computation and data analysis, which will come from a better understanding of information.”

Conventional research in computational biology often largely focuses on identifying single markers associated with disease and phenotype, that is, the observable characteristics of an organism. “However, it is widely believed that such phenotypes result from emergent behavior, one that is better observed as an interacting sub-unit, rather than individually differentiated markers,’’ says Ananth Grama, professor of computer science at Purdue.

“The problem of identifying network signatures, while highly promising in theory, poses profound computational and data-related challenges,’’ Grama adds. “As part of the broader theme within the center on network modeling and analysis, we propose to develop novel models, methods and software that will fundamentally enhance our understanding of disease and phenotype, while potentially uncovering new targets for intervention.’’

Doraiswami Ramkrishna, professor of chemical engineering at Purdue, hopes center researchers will be able to help him distill information that will enable him to predict specific gene expression within cellular metabolism, his field of study.

“We have lots of data, and I want to decipher from that data the information that will help me prove a theory,” he adds. “We want to know that we are correctly predicting which genes are expressing, and which are not. We need to be able to see whether what we predict is reflected in the data.”

In the larger communications arena, center researchers also are studying ways to ensure the timely delivery of messages, which is not always possible within today’s wireless networks. “The news challenge is to build protocols that will deliver the message within strict deadlines in a dynamically changing environment,” Szpankowski says. “We would like to understand how information is being transported in space, and can be delivered by a certain deadline.

“In order to send useful information, for example, when I am using my cell phone, we have to build a virtual connection between users,” he adds. “If we send too much information overhead, there is not enough bandwidth to send useful signals. The goal is to try to cut out the clutter, and to send enough information to construct a path between you and the other person within enough time to get the real information across.”

Finally, center researchers hope to design new ways to sift through and extract relevant knowledge from the flood of information that society encounters daily in countless information exchanges.

“When you query Google, for example, you get an answer, but are you getting the knowledge you need? What is knowledge? How is it created from information?” Szpankowski says. “We are dealing right now with huge datasets. Every company has a huge amount of data. The problem is: how do you extract relevant information from your quest? It’s not a problem to get AN answer. The problem is whether you get THE answer.”