Internet Mysteries: How Much File Sharing Traffic Travels the Net? -- Update
How much of the traffic on the internet is peer-to-peer file trading?
Everyone seems to agree it represents a lot of the traffic, but the truth is no one knows (with the possible exception of the ISPs and backbone providers in the middle, and they aren't telling or sharing raw data).
One of the most recent reports on P2P traffic came from a traffic optimization firm called Ellacoya in June 2007. Their report said that http-based web traffic had overtaken peer-to-peer traffic on the net, thanks to streaming media sites like YouTube.
Ellacoya, since acquired by Arbor Networks for its traffic-shaping technology, pegged http traffic at 46 percent of the net's volume, with P2P traffic close by at 37 percent.The company says the data was based on about 1 million North American broadband subscribers.
But little is known about when, how and where the company collected the data, or how it analyzed the packets.
Independent internet researchers, including KC Claffy of the Cooperative Association for Internet Data Analysis, ran their own tests in 2003 and 2004 -- following conflicting reports that file sharing was decreasing and increasing.
Using data from an internet backbone link in San Jose, California, the researchers found that P2P traffic was steady, if not increasing. For instance, BitTorrent grew some 100 percent in popularity from 2003 to 2004, but the researchers found that it was getting harder to track P2P bits, since P2P traffic was increasingly using encryption and random ports, making it harder to quickly identify the application that a packet was coming from.
The last time Sprint published an analysis of 30 large internet links (January 10, 2005), it found that file sharing accounted for less than 6 percent of the packets in the tube, with regular web traffic clocking in at more than 50 percent of the flow.
Speaking at Supernova conference last July, Claffy expressed confusion at how the government can have a public policy debate about network management when no one except the network operators knows anything about traffic on the net.
NBC filed something with the FCC using the Cache Logic study, done a year after the Pew Internet study, saying that file sharing was dropping and our study showing file sharing was increasing. And the Cache Logic study just came out with a number -- no trends, just that file sharing was 30 to 50 percent of traffic, and NBC uses that number -– way old, no peer review, no methodology -- to say 'You guys, the FCC, have to start policing the network and getting this file sharing off the network.'
All of the data out there is suspect.
The information is vital. Comcast claims that torrents of purloined pop music and movies are filling the internet's tubes -- requiring them to block, divert and dam peer-to-peer traffic. And AT&T says it's going to create technology to detect such sharing by its customers.

Image: Sprint
In Washington, D.C., Congress is once again considering legislating rules for ISPs, while the five-member Federal Trade Commission is publicly wringing its hands over whether to fine or censure Comcast for its BitTorrent blocking and whether to adopt stricter net-neutrality guidelines generally.
For the next couple of weeks, Threat Level is taking a hard look at some of the unsolved mysteries of the internet. This is the first one.
We would love to know if good measurements of P2P traffic are out there or if, indeed, the debate over net neutrality is taking place without the slightest bit of verifiable data.
UPDATE: Ipoque, a P2P traffic management firm, released its own study of internet traffic in 2007, focusing on Germany, Australia, Eastern Europe and Southern Europe.
According to their report, P2P traffic accounted for between 49% and 83 % of internet traffic in these regions. Using deep packet inspection techniques, the company says it could identify the types of files being traded, as well as unique hashes that pinpointed unique files.
For instance in the Middle East, the most popular BitTorrent Audio download was Beyonce's Listen, according to Ipoque. (Does that mean American foreign policy is winning or losing?)
The study is unlikely to please internet scientists, since the data set is not public nor is there much discussion of how the numbers were arrived at.
Photo: Star5112
See Also:
- ISPs' Error Page Ads Let Hackers Hijack Entire Web, Researcher ...
- Commission Ready To Act in Net Neutrality Fight, Says FCC Chief
- FCC Gets an Earful From Open-Net Defenders at Stanford
- Comcast Makes a Deal with BitTorrent
- FCC Chief Promises to Investigate Net Neutrality Complaint Against ...
- Redstate Net Neutrality Flap Jumps To Rush Limbaugh Show




I did a lot of work in this area several years ago. The difficult thing is trying to differentiate traffic based on the port number is inherently flawed. While certain services are traditionally found on specific ports (web services on 80, ssh on 22, etc) there is *no* specific reason why any particular service *has* to run on that port. You can put web services on port 22, ssh on port 2222, ftp on 53621, and so forth. That has a strong tendency to skew data. You also have applications that have control connections on well know ports but spawn connections on ephemeral ports for data transfers - which tend to diffuse the data across a large number of unrelated ports. You also have a lot of bulk file transfers using HTTP which, although its in the spec, sort of blurs the line when it comes to file sharing.
There are some ways to 'fingerprint' the data transfers patterns on a macroscopic level that *may* let you determine rough proportions of different methods of bulk data distribution (p2p, point to point, web, etc) but thats about it.
Are people working with incomplete data? Yes. Is there better information out there? Sort of. Its not super easy to get and still requires a lot of guesswork and interpretation. Such is the nature of the beast (un?) fortunately.