Pentest Tools

Published on May 16th, 2016 📆 | 8040 Views ⚑


WhoDat — WhoIs Database Front End

WhoIs Database Front End: A Front End for whoisxmlapi Data

WHODAT is a front end for whoisxmlapi data (or any whois data living in mongo DB inserted by that csv format) it integrated Whois data, current IP resolutions, and PDNS. In addition to providing an interactive, pivotable web-frontend for analysts to perform research, it also has an api which will allow output in JSON, CSV, or a list of suspicious domains. Finally it will pull updates daily, check them against a list of known malicious registrants and email an alert to a specified email containing the registrant, domain, and current IP.

The hardware required to power this with 140,000,000 domains is non trivial even with only 4 indexed fields it takes 400GB of DB space for all of the primary TLDs.


WhoIs Database Front End: WhoDat


Installation steps:

  • Install — MongoDB — PHP — Mongo PHP Drives — pymongo
  • Download latest trimmed (smallest possible) whoisxmlapi quarterly DB dump
  • Extract the csv files (will be about 100gig) and do something like this
for file in */*.csv; do echo $file && mongoimport --db whois --collection whois --file $file --type csv --headerline --upsert --upsertFields domainName; done
  • Fill in your ISC DNSDB Key in keys.php
  • Fill in your PassiveTotal Key in keys.php
  • Index on domainName, registrant_name, and contactEmail
db.whois.ensureIndex( {domainName: 1})
db.whois.ensureIndex( {contactEmail: 1})
db.whois.ensureIndex( {registrant_name: 1})
db.whois.ensureIndex( {registrant_telephone: 1})
  • Fill in relevant environmental and alerting data in the script as well as your user/pass to download daily updates
  • Enter known bad registrants you wish to track in a file and specify its location in variable registrantpath
  • Create a cronjob to run the update script at 0430 or so EST 30 “4 * * * /usr/bin/python /YOURUPDATEWORKINGDIR/ >/dev/null 2>&1”
  • Place index.php in your webroot of choice


WhoIs Database Front End: WhoDat



The ElasticSearch backend code is still under testing, please consider the following before using ES as a backend:

  • Some things might be broken
    • I.e., some error handling might be non-existent
  • There might be random debug output printed out
  • The search language might not be complete
  • The data template used with ElasticSearch might change
    • Which means you might have ot re-ingest all of your data at some point!


PreReqs to run with ElasticSearch:

  • ElasticSearch installed somewhere
  • python elasticsearch library (pip install elasticsearch)
  • python lex yacc library (pip install ply)
  • below specified prereqs too


      ElasticSearch Scripting ElasticSearch comes with dynamic Groovy scripting disabled due to potential sandbox breakout issues with the Groovy container. Unfortunately, the only way to do certain things in ElasticSearch is via this scripting language. Because the default installation of ES does not have a work-around, there is a setting called ES_SCRIPTING_ENABLED in the pyDat settings file which is set to False by default. When set to True, the pyDat advanced search capability will expose an extra feature called ‘Unique Domains’ which given search results that will return multiple results for a given domain (e.g., due to multiple versions of a domain matching) will return only the latest entry instead of all entries. Before setting this option to True, you must install a script server-side on every ES node — to do this, please copy the file called _score.groovy from the es_scripts directory to your scripts directory located in the elasticsearch configuration directory. On package-based installs of ES on RedHat/CentOS or Ubuntu this should be /etc/elasticsearch/scripts. If the scripts directory does not exist, please create it. Note you have to restart the Node for it to pick up the script.


ElasticSearch Plugins

The murmur3 mapping type was removed from the ElasticSearch core and into a plugin. The stats page uses this field to obtain information about the domains loaded in elasticsearch and further the template provided will not load if the murmur3 mapper is not loaded. Ensure the plugin is installed on every node in your cluster before proceeding. Alternatively, you can remove ‘hash’ field from domainName in the template and disable the stats page (just html comment or remove the link from the header).

To install the plugin, use the plugin utility on every node:

plugin install mapper-murmur3

This will require a restart of the node to pick up the plugin.


Populating ElasticSearch with whoisxmlapi data (Ubuntu 14.04.3 LTS)
  • Install ElasticSearch. Using Docker is the easiest mechanism
  • Download latest trimmed (smallest possible) whoisxmlapi quarterly DB dump.
  • Extract the csv files.
  • Use the included script in the scripts/ directory:
./ -u localhost:9200 -f ~/whois/data/1.csv -i '1' -v -s -x Audit_auditUpdatedDate,updatedDate,standardRegUpdatedDate,expiresDate,standardRegExpiresDate


Local Installation

  • Copy pydat to /var/www/ (or prefered location)
  • Copy pydat/ to pydat/
  • Edit pydat/ to suit your needs.
    • Include your Passive DNS keys if you have any!
  • Configure Apache to use the provided wsgi interface to pydat.
sudo apt-get install libapache2-mod-wsgi
sudo vi /etc/apache2/sites-available/whois

<VirtualHost *:80>
        ServerName whois
        ServerAlias whois
        # Install Location
        WSGIScriptAlias / /var/www/pydat/
        Alias /static/ /var/www/pydat/pydat/static/
        <Location "/static/">
            Options -Indexes


Docker Installation

If you don’t want to install pyDat manually, you can use the docker image to quickly deploy the system. First, make sure to copy to and customize it to match your environment.  You can then launch pyDat by running:

docker run -d --name pydat -p 80:80 -v <path/to/>:/opt/WhoDat/pydat/pydat/ mitrecnd/pydat



pyDat is a Python implementation of Chris Clark’s WhoDat code. It is designed to be more extensible and has more features than the PHP implementation.

[adsense size='1']

Version 2.0 of pyDat introduced support for historical whois searches. This capability necessitated modifying the way data is stored in the database. To aid in properly populating the database, a script calledelasticsearch_populate is provided to auto-populate the data. Note that the data coming from whoisxmlapi doesn’t seem to be always consistent so some care should be taken when ingesting data. More testing needs to be done to ensure all data is ingested properly. Anyone setting up their database, should read the available flags for the script before running it to ensure they’ve tweaked it for their setup. The following is the output from elasticsearch_populate -h

Usage: [options]

  -h, --help            show this help message and exit
  -f FILE, --file=FILE  Input CSV file
  -d DIRECTORY, --directory=DIRECTORY
                        Directory to recursively search for CSV files -
                        prioritized over 'file'
  -e EXTENSION, --extension=EXTENSION
                        When scanning for CSV files only parse files with
                        given extension (default: 'csv')
  -i IDENTIFIER, --identifier=IDENTIFIER
                        Numerical identifier to use in update to signify
                        version (e.g., '8' or '20140120')
  -t THREADS, --threads=THREADS
                        Number of workers, defaults to 2. Note that each
                        worker will increase the load on your ES cluster
  -B BULK_SIZE, --bulk-size=BULK_SIZE
                        Size of Bulk Insert Requests
  -v, --verbose         Be verbose
  --vverbose            Be very verbose (Prints status of every domain parsed,
                        very noisy)
  -s, --stats           Print out Stats after running
  -x EXCLUDE, --exclude=EXCLUDE
                        Comma separated list of keys to exclude if updating
  -n INCLUDE, --include=INCLUDE
                        Comma separated list of keys to include if updating
                        entry (mutually exclusive to -x)
  -o COMMENT, --comment=COMMENT
                        Comment to store with metadata
  -r, --redo            Attempt to re-import a failed import or import more
                        data, uses stored metatdata from previous import (-o
                        and -x not required and will be ignored!!)
  -u ES_URI, --es-uri=ES_URI
                        Location of ElasticSearch Server (e.g.,
  -p INDEX_PREFIX, --index-prefix=INDEX_PREFIX
                        Index prefix to use in ElasticSearch (default: whois)
                        How many threads to use for making bulk requests to ES

Note that when adding a new version of data to the database, you should use either the -x flag to exclude certain fields that are not important to track changes or the -n flag to include specific fields that are subject to scrutiny. This will significantly decrease the amount of data that is stored between versions. You can only use either -x or -n not both at the same time, but you can choose whichever is best for your given environment. As an example, if you get daily updates, you might decide that for daily updates you only care if contactEmail changes but every quarter you might want to instead only exclude certain fields you don’t find important.

Version 3.0 of pyDat introduces ElasticSearch as the backend going forward for storing and searching data. Although the mongo backend should still work, it should be considered deprecated and it is recommended installations move to ES as a backend as it provides numerous benefits with regards to searching, including a full-featured query language allowing for more powerful searches.


WhoIs Database Front End: WhoDat

Comments are closed.