summaryrefslogtreecommitdiff
path: root/doc/fitmle.1
diff options
context:
space:
mode:
Diffstat (limited to 'doc/fitmle.1')
-rw-r--r--doc/fitmle.1136
1 files changed, 136 insertions, 0 deletions
diff --git a/doc/fitmle.1 b/doc/fitmle.1
new file mode 100644
index 0000000..75a43d1
--- /dev/null
+++ b/doc/fitmle.1
@@ -0,0 +1,136 @@
+.\" generated with Ronn/v0.7.3
+.\" http://github.com/rtomayko/ronn/tree/0.7.3
+.
+.TH "FITMLE" "1" "September 2017" "www.complex-networks.net" "www.complex-networks.net"
+.
+.SH "NAME"
+\fBfitmle\fR \- Fit a set of values with a power\-law distribution
+.
+.SH "SYNOPSIS"
+\fBfitmle\fR \fIdata_in\fR [\fItol\fR [TEST [\fInum_test\fR]]]
+.
+.SH "DESCRIPTION"
+\fBfitmle\fR fits the data points contained in the file \fIdata_in\fR with a power\-law function P(k) ~ k, using the Maximum\-Likelihood Estimator (MLE)\. In particular, \fBfitmle\fR finds the exponent \fBgamma\fR and the minimum of the values provided on input for which the power\-law behaviour holds\. The second (optional) argument \fItol\fR sets the acceptable statistical error on the estimate of the exponent\.
+.
+.P
+If \fBTEST\fR is provided, the program associates a p\-value to the goodness of the fit, based on the Kolmogorov\-Smirnov statistics computed on \fInum_test\fR sampled distributions from the theoretical power\-law function\. If \fInum_test\fR is not provided, the test is based on 100 sampled distributions\.
+.
+.SH "PARAMETERS"
+.
+.TP
+\fIdata_in\fR
+Set of values to fit\. If is equal to \fB\-\fR (dash), read the set from STDIN\.
+.
+.TP
+\fItol\fR
+The acceptable statistical error on the estimation of the exponent\. If omitted, it is set to 0\.1\.
+.
+.TP
+TEST
+If the third parameter is \fBTEST\fR, the program computes an estimate of the p\-value associated to the best\-fit, based on \fInum_test\fR synthetic samples of the same size of the input set\.
+.
+.TP
+\fInum_test\fR
+Number of synthetic samples to use for the estimation of the p\-value of the best fit\.
+.
+.SH "OUTPUT"
+If \fBfitmle\fR is given less than three parameters (i\.e\., if \fBTEST\fR is not specified), the output is a line in the format:
+.
+.IP "" 4
+.
+.nf
+
+ gamma k_min ks
+.
+.fi
+.
+.IP "" 0
+.
+.P
+where \fBgamma\fR is the estimate for the exponent, \fBk_min\fR is the smallest of the input values for which the power\-law behaviour holds, and \fBks\fR is the value of the Kolmogorov\-Smirnov statistics of the best\-fit\.
+.
+.P
+If \fBTEST\fR is specified, the output line contains also the estimate of the p\-value of the best fit:
+.
+.IP "" 4
+.
+.nf
+
+ gamma k_min ks p\-value
+.
+.fi
+.
+.IP "" 0
+.
+.P
+where \fBp\-value\fR is based on \fInum_test\fR samples (or just 100, if \fInum_test\fR is not specified) of the same size of the input, obtained from the theoretical power\-law function computed as a best fit\.
+.
+.SH "EXAMPLES"
+Let us assume that the file \fBAS\-20010316\.net_degs\fR contains the degree sequence of the data set \fBAS\-20010316\.net\fR (the graph of the Internet at the AS level in March 2001)\. The exponent of the best\-fit power\-law distribution can be obtained by using:
+.
+.IP "" 4
+.
+.nf
+
+ $ fitmle AS\-20010316\.net_degs
+ Using discrete fit
+ 2\.06165 6 0\.031626 0\.17
+ $
+.
+.fi
+.
+.IP "" 0
+.
+.P
+where \fB2\.06165\fR is the estimated value of the exponent \fBgamma\fR, \fB6\fR is the minimum degree value for which the power\-law behaviour holds, and \fB0\.031626\fR is the value of the Kolmogorov\-Smirnov statistics of the best\-fit\. The program is also telling us that it decided to use the discrete fitting procedure, since all the values in \fBAS\-20010316\.net_degs\fR are integers\. The latter information is printed to STDERR\.
+.
+.P
+It is possible to compute the p\-value of the estimate by running:
+.
+.IP "" 4
+.
+.nf
+
+ $ fitmle AS\-20010316\.net_degs 0\.1 TEST
+ Using discrete fit
+ 2\.06165 6 0\.031626 0\.17
+ $
+.
+.fi
+.
+.IP "" 0
+.
+.P
+which provides a p\-value equal to 0\.17, meaning that 17% of the synthetic samples showed a value of the KS statistics larger than that of the best\-fit\. The estimation of the p\-value here is based on 100 synthetic samples, since \fInum_test\fR was not provided\. If we allow a slightly larger value of the statistical error on the estimate of the exponent \fBgamma\fR, we obtain different values of \fBgamma\fR and \fBk_min\fR, and a much higher p\-value:
+.
+.IP "" 4
+.
+.nf
+
+ $ fitmle AS\-20010316\.net_degs 0\.15 TEST 1000
+ Using discrete fit
+ 2\.0585 19 0\.0253754 0\.924
+ $
+.
+.fi
+.
+.IP "" 0
+.
+.P
+Notice that in this case, the p\-value of the estimate is equal to 0\.924, and is based on 1000 synthetic samples\.
+.
+.SH "SEE ALSO"
+deg_seq(1), power_law(1)
+.
+.SH "REFERENCES"
+.
+.IP "\(bu" 4
+A\. Clauset, C\. R\. Shalizi, and M\. E\. J\. Newman\. "Power\-law distributions in empirical data"\. SIAM Rev\. 51, (2007), 661\-703\.
+.
+.IP "\(bu" 4
+V\. Latora, V\. Nicosia, G\. Russo, "Complex Networks: Principles, Methods and Applications", Chapter 5, Cambridge University Press (2017)
+.
+.IP "" 0
+.
+.SH "AUTHORS"
+(c) Vincenzo \'KatolaZ\' Nicosia 2009\-2017 \fB<v\.nicosia@qmul\.ac\.uk>\fR\.