Command line tool

When DeepCASE is installed, it can be used from the command line. The __main__.py file in the deepcase module implements this command line tool. The command line tool provides a quick and easy interface to predict sequences from .csv or .txt files. The full command line usage is given in its help page:

usage: deepcase.py [-h] [--csv CSV] [--txt TXT] [--events EVENTS] [--length LENGTH] [--timeout TIMEOUT]
                   [--save-sequences SAVE_SEQUENCES] [--load-sequences LOAD_SEQUENCES] [--hidden HIDDEN]
                   [--delta DELTA] [--save-builder SAVE_BUILDER] [--load-builder LOAD_BUILDER]
                   [--confidence CONFIDENCE] [--epsilon EPSILON] [--min_samples MIN_SAMPLES]
                   [--save-interpreter SAVE_INTERPRETER] [--load-interpreter LOAD_INTERPRETER]
                   [--save-clusters SAVE_CLUSTERS] [--load-clusters LOAD_CLUSTERS]
                   [--save-prediction SAVE_PREDICTION] [--epochs EPOCHS] [--batch BATCH] [--device DEVICE]
                   [--silent]
                   {sequence,train,cluster,manual,automatic}

DeepCASE: Semi-Supervised Contextual Analysis of Security Events

positional arguments:
  {sequence,train,cluster,manual,automatic}  mode in which to run DeepCASE

optional arguments:
  -h, --help                                 show this help message and exit

Input/Output:
  --csv CSV                                  CSV events file to process
  --txt TXT                                  TXT events file to process
  --events EVENTS                            number of distinct events to handle         (default =  auto)

Sequencing:
  --length LENGTH                            sequence LENGTH                             (default =    10)
  --timeout TIMEOUT                          sequence TIMEOUT (seconds)                  (default = 86400)
  --save-sequences SAVE_SEQUENCES            path to save sequences
  --load-sequences LOAD_SEQUENCES            path to load sequences

ContextBuilder:
  --hidden HIDDEN                            HIDDEN layers dimension                     (default =   128)
  --delta DELTA                              label smoothing DELTA                       (default =   0.1)
  --save-builder SAVE_BUILDER                path to save ContextBuilder
  --load-builder LOAD_BUILDER                path to load ContextBuilder

Interpreter:
  --confidence CONFIDENCE                    minimum required CONFIDENCE                 (default =   0.2)
  --epsilon EPSILON                          DBSCAN clustering EPSILON                   (default =   0.1)
  --min_samples MIN_SAMPLES                  DBSCAN clustering MIN_SAMPLES               (default =     5)
  --save-interpreter SAVE_INTERPRETER        path to save Interpreter
  --load-interpreter LOAD_INTERPRETER        path to load Interpreter
  --save-clusters SAVE_CLUSTERS              path to CSV file to save clusters
  --load-clusters LOAD_CLUSTERS              path to CSV file to load clusters
  --save-prediction SAVE_PREDICTION          path to CSV file to save prediction

Train:
  --epochs EPOCHS                            number of epochs to train with              (default =    10)
  --batch BATCH                              batch size       to train with              (default =   128)

Other:
  --device DEVICE                            DEVICE used for computation (cpu|cuda|auto) (default =  auto)
  --silent                                   silence mode, do not print progress

Examples

Below, we provide various examples of using the command-line tool for running DeepCASE.

Event sequencing

Transform .csv or .txt files into sequences and store them in the file sequences.save.

python3 deepcase sequence --csv <path/to/file.csv> --save-sequences sequences.save
python3 deepcase sequence --txt <path/to/file.txt> --save-sequences sequences.save

ContextBuilder

Train the ContextBuilder on the input samples loaded from the file sequences.save and store the trained ContextBuilder in the file builder.save.

python3 deepcase train\
     --load-sequences sequences.save\
     --save-builder builder.save

Interpreter

Run in manual mode where the Interpreter clusters the given sequences. We load the sequences from sequences.save and the trained ContextBuilder from builder.save. We store the interpreter (containing all clusters) to the file interpreter.save and the generated clusters to clusters.csv. The clusters.csv file contains two columns: cluster and label. We can manually label the individual samples within the cluster by changing the label value, note that the rows of the csv file corresond to the loaded sequences. If the sequences itself contained labels, these labels are used for storing in the csv file, otherwise, all clusters are assigned a label of -1.

python3 deepcase cluster\
     --load-sequences sequences.save\
     --load-builder builder.save\
     --save-interpreter interpreter.save\
     --save-clusters clusters.csv

Manual Mode

Once we (manually) provided a label to each cluster, we can assign these label in manual mode and save the updated interpreter.

Note

If --load-clusters is not specified, DeepCASE will try to use the labels extracted from the sequences it processes (see Preprocessor). If no labels were provided there either, DeepCASE throws an error.

python3 deepcase manual\
     --load-sequences sequences.save\
     --load-builder builder.save\
     --load-interpreter interpreter.save\
     --load-clusters clusters.csv\
     --save-interpreter interpreter_fitted.save

(Semi)-automatic Mode

Once we assigned labels to the clusters in the Interpreter, we can use DeepCASE to predict labels for new sequences. We save these predicted labels in a file called prediction.save.

Note

If sequences contain labels (see Preprocessor), we also output a classification report and confusion matrix to show the performance of DeepCASE.

python3 deepcase automatic\
     --load-sequences sequences.save\
     --load-builder builder.save\
     --load-interpreter interpreter_fitted.save\
     --save-prediction prediction.csv