fitmle - Fit a set of values with a power-law distribution
fitmle data_in [tol [TEST [num_test]]]
fitmle fits the data points contained in the file data_in with a
power-law function P(k) ~ k-gamma, using the Maximum-Likelihood
Estimator (MLE). In particular, fitmle finds the exponent gamma
and the minimum of the values provided on input for which the
power-law behaviour holds. The second (optional) argument tol sets
the acceptable statistical error on the estimate of the exponent.
If TEST is provided, the program associates a p-value to the
goodness of the fit, based on the Kolmogorov-Smirnov statistics
computed on num_test sampled distributions from the theoretical
power-law function. If num_test is not provided, the test is based
on 100 sampled distributions.
Set of values to fit. If is equal to - (dash), read the set from
STDIN.
The acceptable statistical error on the estimation of the exponent. If omitted, it is set to 0.1.
If the third parameter is TEST, the program computes an estimate
of the p-value associated to the best-fit, based on num_test
synthetic samples of the same size of the input set.
Number of synthetic samples to use for the estimation of the p-value of the best fit.
If fitmle is given less than three parameters (i.e., if TEST is
not specified), the output is a line in the format:
gamma k_min ks
where gamma is the estimate for the exponent, k_min is the
smallest of the input values for which the power-law behaviour holds,
and ks is the value of the Kolmogorov-Smirnov statistics of the
best-fit.
If TEST is specified, the output line contains also the estimate of
the p-value of the best fit:
gamma k_min ks p-value
where p-value is based on num_test samples (or just 100, if
num_test is not specified) of the same size of the input, obtained
from the theoretical power-law function computed as a best fit.
Let us assume that the file AS-20010316.net_degs contains the degree
sequence of the data set AS-20010316.net (the graph of the Internet
at the AS level in March 2001). The exponent of the best-fit power-law
distribution can be obtained by using:
$ fitmle AS-20010316.net_degs
Using discrete fit
2.06165 6 0.031626 0.17
$
where 2.06165 is the estimated value of the exponent gamma, 6 is
the minimum degree value for which the power-law behaviour holds, and
0.031626 is the value of the Kolmogorov-Smirnov statistics of the
best-fit. The program is also telling us that it decided to use the
discrete fitting procedure, since all the values in
AS-20010316.net_degs are integers. The latter information is printed
to STDERR.
It is possible to compute the p-value of the estimate by running:
$ fitmle AS-20010316.net_degs 0.1 TEST
Using discrete fit
2.06165 6 0.031626 0.17
$
which provides a p-value equal to 0.17, meaning that 17% of the
synthetic samples showed a value of the KS statistics larger than that
of the best-fit. The estimation of the p-value here is based on 100
synthetic samples, since num_test was not provided. If we allow a
slightly larger value of the statistical error on the estimate of the
exponent gamma, we obtain different values of gamma and k_min,
and a much higher p-value:
$ fitmle AS-20010316.net_degs 0.15 TEST 1000
Using discrete fit
2.0585 19 0.0253754 0.924
$
Notice that in this case, the p-value of the estimate is equal to 0.924, and is based on 1000 synthetic samples.
deg_seq(1), power_law(1)
A. Clauset, C. R. Shalizi, and M. E. J. Newman. "Power-law distributions in empirical data". SIAM Rev. 51, (2007), 661-703.
V. Latora, V. Nicosia, G. Russo, "Complex Networks: Principles, Methods and Applications", Chapter 5, Cambridge University Press (2017)
(c) Vincenzo 'KatolaZ' Nicosia 2009-2017 <v.nicosia@qmul.ac.uk>.