Published on May 30th, 2018 📆 | 4005 Views ⚑
0Diskover – File System Crawler, Storage Search Engine And Analytics Powered By Elasticsearch
Screenshots
diskover-web (diskover's web file manager, analytics app, file system search engine, rest-api)
Kibana dashboards/saved searches/visualizations and support for Gource
Diskover Gource videos
Installation Guide
Requirements
Linux or OS X/macOS
(tested on OS X 10.11.6, Ubuntu 16.04)Python 2.7. or Python 3.5./3.6.
(tested on Python 2.7.14, 3.5.3, 3.6.4)Python elasticsearch client module
Python requests module
Python scandir module
Python progressbar2 module
Python redis module
Python rq module
Elasticsearch 5
(local or AWS ES Service, tested on Elasticsearch 5.4.2, 5.6.4) Elasticsearch 6 is not supported yet.Redis
(tested on 4.0.8)
Install the above Python modules using pip.
Optional Installs
- diskover-web (diskover's web file manager and analytics app)
- Redis RQ Dashboard (for monitoring redis queue)
- sharesniffer (for scanning your network for file shares and auto-mounting for crawls)
- Kibana (for visualizing Elasticsearch data, tested on Kibana 5.4.2, 5.6.4)
- X-Pack (Kibana plugin for graphs, reports, monitoring and http auth)
- Gource (for Gource visualizations of diskover Elasticsearch data, see videos above)
Download
$ git clone https://github.com/shirosaidev/diskover.git
$ cd diskover
Requirements
You need to have at least Python 2.7. or Python 3.5. and have installed required Python dependencies using pip
.
$ pip install -r requirements.txt
Getting Started
Copy diskover config diskover.cfg.sample
to diskover.cfg
and edit for your environment.
Start diskover worker bots (as many as you want, a good number might be cores x 2) with:
$ cd /path/with/diskover
$ python diskover_worker_bot.py
Worker bots can be added during a crawl to help with the queue. To run a worker bot in burst mode (quit after all jobs done), use the -b flag. If the queue is empty these bots will die, so use rq info
or rq-dashboard
to see if they are running. Run diskover-bot-launcher.sh
to spawn and kill multiple bots.
Start diskover main job dispatcher and file tree crawler with:
$ python /path/to/diskover.py -d /rootpath/you/want/to/crawl -i diskover-indexname -a
Defaults for crawl with no flags is to index from . (current directory) and files >0 Bytes and 0 days modified time. Empty files and directores are skipped (unless you use -s 0 and -e flags). Use -h to see cli options.
User Guide
Read the wiki for more documentation on how to use diskover.
Gloss