How difficult would it be to learn who first Tweeted a nasty rumour about you, if it were spread by copy-and-paste rather than by re-tweeting?

When the Centers for Disease Control (CDC) needs to trace the source of an epidemic, do they need to interview the families of each victim?

A team led by Dr. Pedro Pinto has developed a vastly improved algorithm to make these tasks quicker and more efficient.

His report was published August 10, 2012, in the journal *Physical Review Letters.*

## How to Trace a Tweet More Efficiently

*Locating the Source of Diffusion in Large-Scale Networks*, by Pedro C. Pinto, Patrick Thiran and Martin Vetterli of the École Polytechnique Fédérale de Lausanne (EPFL) in Lausanne, Switzerland, describes an efficient way to trace the source of a message or infection back through the network by which it spread.

The paper includes statistical arguments to demonstrate the effectiveness and efficiency of their approach.

Efficiency is one important breakthrough; the new method does not require examining each and every node in the network. This means that the source can be found more quickly.

Consider the above image by Marc Smith that shows how the message “CSCW” was relayed among a number of Twitter users. Can you determine who sent the first message in the first place? Don’t worry – I couldn’t, either.

## Networks, Graphs, Trees and Computational Complexity

A network has nodes, such as e-mail users, that are connected by edges, such as knowing the e-mail address of one other user – Nodes may only communicate if they are connected.

A directed network only permits one-way transmission along any one edge; in order to reply, the first receiver would need to use another edge. However, a non-directed network allows two-way communication along one edge.

There are two main types of network: a tree or a graph. A graph is the general shape, which permits cycles. An example of a cycle would be that nodes A, B and C are fully connected to each other. The set of (two-way) edges is {(A,B), (A,C), (B,C)}.

A tree has a root and branches, but no cycles. Here the example is that node A connects directly to B and C, with no other path between B and C. The set of (two-way) edges is {(A,B), (A,C)}.

Pinto developed two algorithms: one to deal with “tree” networks; the other for “graph” networks.

Computational complexity” indicates how the number of steps required to solve the problem grows as the number of elements grows. Three important orders of complexity are “polynomial”, “exponential” and “factorial”.

Let ‘N’ be the number of elements in a problem. Polynomial complexity means that the difficulty increases as a power of ‘N’: whether ‘N’ or “N*log(N)”, “N^2″, “N^3″ or more. For example, a poor sorting algorithm may take “O(N^2)” (“on the order of N squared”) steps to sort ‘N’ elements. However, the best sorting algorithms can achieve “O(N*log(N))” (“on the order of N times the logarithm of N”).

Exponential complexity means that the number of elements is the exponent, rather than the base. For ‘x’ as a real number greater than one, “O(x^N)” has complexity “on the order of ‘x’ to the power ‘N’”, which grows faster than any polynomial for large ‘N’. Some algorithms to solve the “travelling salesman” problem are “O(x^N).

Factorial complexity is “O(N!)” (“on the order of ‘N’ factorial”). Other “travelling salesman” algorithms are “O(N!)”. Again, this is far worse than polynomial complexity.

Improving the order of computational complexity is a signficant achievement in computer programming. Proving that such an improvement can, or cannot, be made is extremely important in mathematics.

Decoding Science. One article at a time.