Pentest Tools

Published on July 17th, 2015 📆 | 6434 Views ⚑


yarGen – A Generator for Yara Rules (for malware researchers)
yarGen is a generator for Yara rules.
What does yarGen do?

The main principle is the creation of yara rules from strings found in malware files while removing all strings that also appear in goodware files.

Since version 0.14.0 it uses naive-bayes-classifier by Mustafa Atik and Nejdet Yucesoy in order to classify the string and detect useful words instead of compressionDm4wlmQl.pngencryption garbage.

Since version 0.12.0 yarGen does not completely remove the goodware strings from the analysis process but includes them with a very low score. The rules will be included if no better strings can be found and marked with a comment Dm4wlmQl.png* Goodware rule *Dm4wlmQl.png. Force yarGen to remvoe all goodware strings with --excludegood. Also since version 0.12.0 yarGen allows to place the "strings.xml" from PEstudio in the program directory in order to apply the blacklist definition during the string analysis process. You'll get better results.

[adsense size='1']

The rule generation process tries to identify similarities between the files that get analyzed and then combines the strings to so called "super rules". Up to now the super rule generation does not remove the simple rule for the files that have been combined in a single super rule. This means that there is some redundancy when super rules are created. You can supress a simple rule for a file that was already covered by super rule by using --nosimple.

  1. Make sure you have at least 2GB of RAM on the machine you plan to use yarGen
  2. Clone the git repository
  3. Install all dependancies with sudo pip install pickle scandir lxml naiveBayesClassifier
  4. Unzip the goodware database (e.g. 7z x
  5. See help with python --help

    Memory Requirements

    Warning: yarGen pulls the whole goodstring database to memory and uses up to 2 GB of memory for a few seconds.

    Command Line Parameters
    [adsense size='1']

    usage: [-h] [-m M] [-g G] [-u] [-c] [-o output_rule_file]
    [-p prefix] [-a author] [-r ref] [-l min-size] [-z min-score]
    [-s max-size] [-rc maxstrings] [-nr] [-oe] [-fs size-in-MB]
    [--score] [--inverse] [--nodirname] [--noscorefilter]
    [--excludegood] [--nosimple] [--nomagic] [--nofilesize]
    [-fm FM] [--noglobal] [--nosuper] [--debug]


    optional arguments:
    -h, --help show this help message and exit
    -m M Path to scan for malware
    -g G Path to scan for goodware (dont use the database
    shipped with yaraGen)
    -u Update local goodware database (use with -g)
    -c Create new local goodware database (use with -g)
    -o output_rule_file Output rule file
    -p prefix Prefix for the rule description
    -a author Author Name
    -r ref Reference
    -l min-size Minimum string length to consider (default=8)
    -z min-score Minimum score to consider (default=5)
    -s max-size Maximum length to consider (default=128)
    -rc maxstrings Maximum number of strings per rule (default=20,
    intelligent filtering will be applied)
    -nr Do not recursively scan directories
    -oe Only scan executable extensions EXE, DLL, ASP, JSP,
    -fs size-in-MB Max file size in MB to analyze (default=3)
    --score Show the string scores as comments in the rules
    --inverse Show the string scores as comments in the rules
    --nodirname Don't use the folder name variable in inverse rules
    --noscorefilter Don't filter strings based on score (default in
    'inverse' mode)
    --excludegood Force the exclude all goodware strings
    --nosimple Skip simple rule creation for files included in super
    --nomagic Don't include the magic header condition statement
    --nofilesize Don't include the filesize condition statement
    -fm FM Multiplier for the maximum 'filesize' condition
    (default: 5)
    --noglobal Don't create global rules
    --nosuper Don't try to create super rules that match against
    various files
    --debug Debug output

    Best Practice

    See the following blog post for a more detailed description on how to use yarGen for YARA rule creation: How to Write Simple but Sound Yara Rules


    Use the shipped database (FAST) to create some rules

    python -m X:\MAL\Case1401
    [adsense size='1']
    Use the shipped database of goodware strings and scan the malware directory "X:\MAL" recursively. Create rules for all files included in this directory and below. A file named 'yargen_rules.yar' will be generated in the current directory.

    Show the score of the strings as comment

    yarGen will by default use the top 20 strings based on their score. To see how a certain string in the rule scored, use the "--score" parameter.

    python --score -m X:\MAL\Case1401
    Use only strings with a certain minimum score

    In order to use only strings for your rules that match a certain minimum score use the "-z" parameter. It is a good pratice to first create rules with "--score" and than perform a second run with a minimum score set for you sample set via "-z".

    python --score -z 5 -m X:\MAL\Case1401
    Preset author and reference

    python -a "Florian Roth" -r "http:Dm4wlmQl.pngDm4wlmQl.pnggoo.glDm4wlmQl.pngc2qgFx" -m Dm4wlmQl.pngoptDm4wlmQl.pngmalDm4wlmQl.pngcase_441 -o case441.yar
    Exclude strings from Goodware samples

    python --excludegood -m Dm4wlmQl.pngoptDm4wlmQl.pngmalDm4wlmQl.pngcase_441
    Supress simple rule if alreay covered by a super rules

    python --nosimple -m Dm4wlmQl.pngoptDm4wlmQl.pngmalDm4wlmQl.pngcase_441
    Show debugging output

    python --debug -m Dm4wlmQl.pngoptDm4wlmQl.pngmalDm4wlmQl.pngcase_441
    Create a new goodware strings database

    python -c -g C:\Windows\System32
    Update the goodware strings database (append new strings to the old ones)

    python -u -g "C:\Program Files"
    Inverse rule creation (still beta)

    In order to create some inverse rules on goodware, you have to prepare a directory with subdirectories in which you include all versions of the files you want to create inverse rules for with their original name and in their original folder. If that sounds strange, let me give you an example.
    E.g. you want to create inverse rules for all Windows executables in the System32 folder, you have to create a goodware archive with the following directory structure: