diff options
author | KatolaZ <katolaz@freaknet.org> | 2018-08-01 10:27:54 +0100 |
---|---|---|
committer | KatolaZ <katolaz@freaknet.org> | 2018-08-01 10:27:54 +0100 |
commit | 11856792221c7faa2f423bc106d1f4b1482bcdb8 (patch) | |
tree | 10f5bb3a40ab184272efb31bfecc3bcc3da136e9 /README.md | |
parent | 040ba18f7fd17b4d6dc3a93549a19263fb0b8a95 (diff) |
new url_to_id and added dry-run in burrow
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 29 |
1 files changed, 29 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..be73adc --- /dev/null +++ b/README.md @@ -0,0 +1,29 @@ +## Burrow-The-Burrows + +A Gopher burrower in a shell script. By using `burrow` and a bit of +plumbing you can get all the links in a Gopher MENU, recursively visit +all the available subdirs, and create a directed graph of the visited +selectors. + +`burrow` takes as input a gopher identifier, as generated by +`url_to_id`, which is considered a gophermap, and provides on stdout the +list of menu selectors found in that document. `burrow` will also dump +on stderr the list of all the edges (to any kind of selector) found in +that page, in the format: + + src_SHA256 dst_SHA256 + +where `src_SHA256` is the SHA256 of the source selector (the current +document), while `dst_SHA256` is the destination selector (the pointed +document). + +To start a crawl, one can do something like: + +``` + $ ./url_to_id gopher://your.gopher.url/ > ids + $ tail -f ids | parallel -j2 './burrow {}' 2>> graph.txt | tee -a ids >/dev/null & +``` + +Notice that `burrow` will create a certain number of folders in the +current directory, used to keep track of the selectors that have been +already retrieved. |