What is Entropy of WWW?
Milan Kunz
63800 Brno, Jurkovičova 13, Czech Republic
Internet with its WWW is such a fascinating phenomenon that it requires apparently its own theory. Barabasi and Albert (1) analyzed WWW a found that the connections in WWW do not correspond to frequencies expected from Erdos Renyi theory of random graphs. This is not surprising, since WWW is not growing by purely stochastic processes but due to efforts of its weavers.
Observed statistics of WWW is not unexpected. 50 years ago Shannon (2) formulating the theory of communication observed, that symbols in texts of natural languages are not used optimally for transporting messages but their distribution is extremely skewed. He considered this phenomenon as a nuisance and named it redundancy. This phenomenon was much speculated about (3).
Information scientists collected thousands of statistics (statistical linguistics, informatics, scientometrics) similar the statistics (1) of the WWW. All these distributions are extremely skewed and in their crude and simplest form can all be expressed by a power law correlation (4). This was considered specific for information distributions (5), since the basic distributions of statistical mechanics are of another type.
Barabasi and Albert did not analyzed their data thoroughly. Otherwise, they could observe a rank correlation of largest centers of WWW of similar type (log-log), and a selfcorrelation of the number of links of the largest providers having the ranks r with the number of members of WWW with only mk links when r = mk. This relation would be of the same type, again (6).
There seems to be a difference between a net and simple counts of symbols or whole words in texts, counts of papers produced by different authors collected in bibliographies or citations.
For analyzing information can be used matrix formalism (7). The WWW is similar to a huge electrical circuit where instead electrons information flows. Actually, this information is coded by bits of electrons, which makes a difference to the brain, where information is transported in form of molecules.
It is well known that theory of electrical circuits was elaborated by Kirchhoff. The basic matrix corresponding to an electrical circuit is known as Laplace-Kirchhoff matrix STS. The matrix STS is the quadratic form of the incidence matrix S (sij = -1, if the i-th arc (link, connection) goes out of the vertex j, sij = 1, if the i-th arc (link, connection) goes into the vertex j, sij = 0 otherwise, the arcs can be weighted).
The Laplace-Kirchhoff matrix can be decomposed into the diagonal matrix V of the vertex degrees vij and the adjacency matrix A.
Barabasi and Albert analyzed only the diagonal matrix V of the vertex degrees vij.
The Laplace-Kirchhoff matrix STS is singular and has just one zero eigenvalue. It has a generalized inverse, which can be found by differentiating the matrix STS according to its vertices, inverting these differences and summing these n inverse matrices (9). The difference of a matrix is found by eliminating the cross of the j-th row and column from it.
This differentiation gives a formal base for splitting of graphs with cycles into forests of n stars.
The off-diagonal elements of this cross form an adjacency matrix A of a star of arcs of the vertex j, if empty places are zeroes by default. From these matrices can be eliminated eventually empty rows and columns.
The set of the matrices of stars is arranged into the diagonal block matrix of the size n2. There are n places for the original n vertices and the remaining n(n-1) places are for each arc of the complete graph, since each arc is counted twice by its ending vertices. If the graph is not a complete graph, the size of the matrix reduces to the dimension (n + 2m).
Eventually, the Laplace-Kirchhoff matrix STS can be split into two sets of star forests, one counting arcs going into the vertices, the other counting arcs going out of the vertices. This requires somewhat more complicated formal matrix operations.
The equation giving the number of star forests of the given type was found long ago (9). The enumerator contains the product of two polynomial coefficients for n and m permutations (permutations columns and rows of the corresponding matrix, respectively), and an additional term for permutations between centers of the stars and their leaves.
There were tried many equations for correlating extremely skewed distributions of information, from the simplest form used by Barabasi and Albert, till to the sophisticated moreparametric distribution proposed by Sichel (10).
Quite satisfactory results gives the lognormal distribution which was used at first by Shockey (11). There are some technical difficulties connected with the truncation of the distributions at the frequency 1, but the distribution can be evaluated as a linear correlation (12).
The correlations usually deviate from linearity at low frequencies, where higher counts are observed than expected. The correlation can be improved using an exotical substitution for frequencies y = log2(mk+1)!. The double logarithming gives for 1 minus infinity whereas loglog2(1+1)! = 0, a suitable truncation.
Remembering that the amount of information should be measured on a logarithmic scale, this leads to a conclusion that the information size of vertices of WWW can be approximated by the normal (Gaussian) distribution.
Another explanation is that we observe distribution of information entropy.
Boltzmann connected thermodynamical entropy with the logarithmic measure of probability (13). He used specifically in the case of ideal gas molecules the polynomial coefficient n!/S P nk! for n permutations, nk being the number of molecules having the same number of quanta of energy (sic, 30 years before Planck). Using analogy with the formula for energy of photons, nk is the number of information vectors having the same frequency mk in the set.
Using a brute force solution, we found that this polynomial coefficient is maximal, if all nk are equal, as best if all nk = 1 (14).The second polynomial coefficient for m permutations has the form m!/P mj!, the index j going from 1 till n, n being the number of vectors. Usually there are more information vectors nk having the same frequency mk. The logarithm of this coefficient nklog mk! -log m! can be compared with the Shannon measure of entropy (2,9), since it gives the count of different strings which can formed from the given set of symbols by permutations of their ordering. Both polynomial coefficients have maxima for different shapes of distribution. The polynomial coefficient m!/P mj! is maximal, if all mj are equal, as best if all mj = 1. Then nk = n and the polynomial coefficient for n permutations is just 1, and corresponding entropy is zero.
Splitting the entropy increments nklog mj! as nk = f(log mj!), it was found that the information entropy has lognormal distribution.
It is difficult to prove that the observed extremely skewed distributions of information indeed give maximal possible entropy, since there could be some additional terms. The maximal possible entropy would mean that the existing distributions of information were the most probable ones.
References and Notes
1. A-L. Barabasi, R. Albert, Science, 286, 509-512 (1999).
2. C. E. Shannon, Bell System Techn.J., 27, 379, 623 (1948).
3. L. Brillouin, Science and Information Theory, Academic Press, New York, 1956.
4. See for example the special issue of Czech. J. Phys. B36, No. 1, edited by J. Vlachý and dedicated to the sixty anniversary of the Lotka Law, and the series of papers of J. Vlachý in this journal.
5. S.D. Haitun, Scientometrics, 4, 5-25, 89-10, 5, 375-395. (1982), M. Kunz, Scientometrics, 13, 289-297 (1988).
6. M. Kunz, Scientometrics, 13, 289-297 (1988).
7. M. Kunz, J. Chem. Inform. Comput. Sci., 33, 193-196 (1993).
8. M. Kunz, J. Math. Chem., 9, 297-305 (1992).
9. M. Kunz, Coll. Czech. Chem. Commun., 51, 1856-1863 (1986).
10. H.S. Sichel, J. Am. Soc. Inform. Science, 31, 314-20 (1980).
11. W. Shockey, Proceedings IRE, 45, 379 (1957).
M. Kunz, Scientometrics, 18, 179-191 (1990).
12. M. Kunz, Scientometrics, 18, 179-191 (1990).
13. L. Boltzmann, Wiener Berichte 76, 373 (1877).
14. This solution gives very high means, especially for gas molecules.