Interpreter

The Interpreter takes input sequences (context and events) and clusters them. In order to do this clustering, it uses the attention values from the ContextBuilder after applying the attention query. Besides clustering, the Interpreter also offers methods to assign scores for Manual analysis, and to predict the scores of unknown sequences for Semi-Automatic analysis.

class interpreter.Interpreter(context_builder, features, eps=0.1, min_samples=5, threshold=0.2)[source]

Interpreter.__init__(context_builder, features, eps=0.1, min_samples=5, threshold=0.2)[source]

Interpreter for a given ContextBuilder.

Parameters:

context_builder (ContextBuilder) – ContextBuilder to interpret.
features (int) – Number of different possible security events.
eps (float, default=0.1) – Epsilon used for determining maximum distance between clusters.
min_samples (int, default=5) – Minimum number of required samples per cluster.
threshold (float, default=0.2) – Minimum required confidence of ContextBuilder before using a context in training clusters.

Fit/Predict methods

We provide a scikit-learn-like API for the Interpreter as a classifier to labels for sequences in the form of clusters and predict the labels of new sequences. To this end, we implement scikit-learn like fit and predict methods for training and predicting with the network.

Fit

The fit() method provides an API for directly learning the maliciousness score of sequences. This method combines Interpreter’s Clustering and Manual Mode for sequences where the labels are known a priori. To this end, it calls the cluster(), score_clusters(), and score() methods in sequence. When the labels for sequences are not known in advance, the Interpreter offers the functionality to first cluster sequences, and then manually inspect clusters for labelling as described in the paper. For this functionality, we refer to the methods:

interpreter.Interpreter.cluster()

interpreter.Interpreter.score_clusters()

interpreter.Interpreter.score()

Interpreter.fit(X, y, scores, iterations=100, batch_size=1024, strategy='max', NO_SCORE=-1, verbose=False)[source]

Fit the Interpreter by performing clustering and assigning scores.

Fit function is a wrapper that calls the following methods:

Interpreter.cluster
Interpreter.score_clusters
Interpreter.score

Parameters:

X (torch.Tensor of shape=(n_samples, seq_length)) – Input context to cluster.
y (torch.Tensor of shape=(n_samples, 1)) – Events to cluster.
scores (array-like of float, shape=(n_samples,)) – Scores for each sample in cluster.
iterations (int, default=100) – Number of iterations for query.
batch_size (int, default=1024) – Size of batch for query.
strategy (string (max|min|avg), default=max) – Strategy to use for computing scores per cluster based on scores of individual events. Currently available options are: - max: Use maximum score of any individual event in a cluster. - min: Use minimum score of any individual event in a cluster. - avg: Use average score of any individual event in a cluster.
NO_SCORE (float, default=-1) – Score to indicate that no score was given to a sample and that the value should be ignored for computing the cluster score. The NO_SCORE value will also be given to samples that do not belong to a cluster.
verbose (boolean, default=False) – If True, prints achieved speedup of clustering algorithm.

Returns:

self – Returns self

Return type:

self

Predict

When the Interpreter is trained using either the fit() method, or by using the individual cluster() and score() methods, we can use the Interpreter in (semi-)automatic mode. To this end, we provide the predict() function which takes context and events as input and outputs the labels of corresponding predicted clusters. If no sequence could be matched, one of the following scores will be given:

-1: Not confident enough for prediction

-2: Label not in training

-3: Closest cluster > epsilon

Note

To use the predict() method, make sure that both the cluster() and score() methods have been called to cluster samples and assign a score to those samples.

Interpreter.predict(X, y, iterations=100, batch_size=1024, verbose=False)[source]

Predict maliciousness of context samples.

Parameters:

X (torch.Tensor of shape=(n_samples, seq_length)) – Input context for which to predict maliciousness.
y (torch.Tensor of shape=(n_samples, 1)) – Events for which to predict maliciousness.
iterations (int, default=100) – Iterations used for optimization.
batch_size (int, default=1024) – Batch size used for optimization.
verbose (boolean, default=False) – If True, print progress.

Returns:

result – Predicted maliciousness score. Positive scores are maliciousness scores. A score of 0 means we found a match that was not malicious. Special cases:

-1: Not confident enough for prediction
-2: Label not in training
-3: Closest cluster > epsilon

Return type:

np.array of shape=(n_samples,)

Fit_predict

Similar to the scikit-learn API, the fit_predict() method performs the fit() and predict() functions in sequence on the same data.

Interpreter.fit_predict(X, y, scores, iterations=100, batch_size=1024, strategy='max', NO_SCORE=-1, verbose=False)[source]

Fit Interpreter with samples and labels and return the predictions of the same samples after running them through the Interpreter.

Parameters:

X (torch.Tensor of shape=(n_samples, seq_length)) – Input context to cluster.
y (torch.Tensor of shape=(n_samples, 1)) – Events to cluster.
scores (array-like of float, shape=(n_samples,)) – Scores for each sample in cluster.
iterations (int, default=100) – Number of iterations for query.
batch_size (int, default=1024) – Size of batch for query.
strategy (string (max|min|avg), default=max) – Strategy to use for computing scores per cluster based on scores of individual events. Currently available options are: - max: Use maximum score of any individual event in a cluster. - min: Use minimum score of any individual event in a cluster. - avg: Use average score of any individual event in a cluster.
NO_SCORE (float, default=-1) – Score to indicate that no score was given to a sample and that the value should be ignored for computing the cluster score. The NO_SCORE value will also be given to samples that do not belong to a cluster.
verbose (boolean, default=False) – If True, prints achieved speedup of clustering algorithm.

Returns:

result – Predicted maliciousness score. Positive scores are maliciousness scores. A score of 0 means we found a match that was not malicious. Special cases:

-1: Not confident enough for prediction
-2: Label not in training
-3: Closest cluster > epsilon

Return type:

np.array of shape=(n_samples,)

Clustering

The main task of the Interpreter is to cluster events. To this end, the cluster() method automatically clusters sequences from the context and events that have been given as input.

Interpreter.cluster(X, y, iterations=100, batch_size=1024, verbose=False)[source]

Cluster contexts in X for same output event y.

Parameters:

X (torch.Tensor of shape=(n_samples, seq_length)) – Input context to cluster.
y (torch.Tensor of shape=(n_samples, 1)) – Events to cluster.
iterations (int, default=100) – Number of iterations for query.
batch_size (int, default=1024) – Size of batch for query.
verbose (boolean, default=False) – If True, prints achieved speedup of clustering algorithm.

Returns:

clusters – Clusters per input sample.

Return type:

np.array of shape=(n_samples,)

Auxiliary cluster methods

To create clusters, we recall from the DeepCASE paper Section III-C1 we apply attention_query() to the result from the ContxtBuilder. Using the obtained attention we create a vector (Section III-B2) representing the context using the method vectorize(). Both steps are combined in the method attended_context().

Interpreter.attended_context(X, y, threshold=0.2, iterations=100, batch_size=1024, verbose=False)[source]

Get vectors representing context after the attention query.

Parameters:

X (torch.Tensor of shape=(n_samples, seq_length)) – Input context to cluster.
y (torch.Tensor of shape=(n_samples, 1)) – Events to cluster.
threshold (float, default=0.2) – Minimum confidence required for creating a vector representing the context.
iterations (int, default=100) – Number of iterations for query.
batch_size (int, default=1024) – Size of batch for query.
verbose (boolean, default=False) – If True, prints achieved speedup of clustering algorithm.

Returns:

vectors (scipy.sparse.csc_matrix of shape=(n_samples, dim_vector)) – Sparse vectors representing each context with a confidence >= threshold.
mask (np.array of shape=(n_samples,)) – Boolean array of masked vectors. True where input has confidence >= threshold, False otherwise.

Interpreter.attention_query(X, y, iterations=100, batch_size=1024, verbose=False)[source]

Compute optimal attention for given context X.

Parameters:

X (array-like of type=int and shape=(n_samples, context_size)) – Input context of events, same as input to fit and predict.
y (array-like of type=int and shape=(n_samples,)) – Observed event.
iterations (int, default=100) – Number of iterations to perform for optimization of actual event.
batch_size (int, default=1024) – Batch size of items to optimize.
verbose (boolean, default=False) – If True, prints progress.

Returns:

confidence (torch.Tensor of shape=(n_samples,)) – Resulting confidence levels in y.
attention (torch.Tensor of shape=(n_samples,)) – Optimal attention for predicting event y.

Interpreter.vectorize(X, attention, size)[source]

Compute the total attention for each event in the context. The resulting vector can be used to compare sequences.

Parameters:

X (torch.Tensor of shape=(n_samples, sequence_length, input_dim)) – Context events to vectorize.
attention (torch.Tensor of shape=(n_samples, sequence_length)) – Attention for each event.
size (int) – Total number of possible events, determines the vector size.

Returns:

result – Sparse vector representing each context.

Return type:

scipy.sparse.csc_matrix of shape=(n_samples, n)

Manual mode

Once events have been clusters, we can assign a label or score to each sequence. This way, we manually label the clusters and prepare the Interpreter object for (semi-)automatically predicting labels for new sequences. To assign labels to clusters, we provide the score() method.

Note

The score() function requires:

that all sequences used to create clusters are assigned a score.
that all sequences in the same cluster are assigned the same score.

If you do not have labels for all clusters or different labels within the same cluster, the interpreter.Interpreter.score_clusters() method prepares scores such that both conditions are satisfied.

Interpreter.score(scores, verbose=False)[source]

Assigns score to clustered samples.

Parameters:

scores (array-like of shape=(n_samples,)) – Scores of individual samples.
verbose (boolean, default=False) – If True, print progress.

Returns:

self – Returns self

Return type:

self

Auxiliary manual methods

As mentioned above, the score() function has two requirements:

that all sequences used to create clusters are assigned a score.
that all sequences in the same cluster are assigned the same score.

We provide the score_clusters() method for the situations where you only have labels for some sequences, or if the labels for sequences within the same cluster are not necessarily equal. This method will apply a given strategy for equalizing the labels per cluster. Additionally, unlabelled clusters will all be labeled using a given NO_SCORE score.

Interpreter.score_clusters(scores, strategy='max', NO_SCORE=-1)[source]

Compute score per cluster based on individual scores and given strategy.

Parameters:

scores (array-like of float, shape=(n_samples,)) – Scores for each sample in cluster.
strategy (string (max|min|avg), default=max) – Strategy to use for computing scores per cluster based on scores of individual events. Currently available options are: - max: Use maximum score of any individual event in a cluster. - min: Use minimum score of any individual event in a cluster. - avg: Use average score of any individual event in a cluster.
NO_SCORE (float, default=-1) – Score to indicate that no score was given to a sample and that the value should be ignored for computing the cluster score. The NO_SCORE value will also be given to samples that do not belong to a cluster.

Returns:

scores – Scores for individual sequences computed using clustering strategy. All datapoints within a cluster are guaranteed to have the same score.

Return type:

np.array of shape=(n_samples)

Semi-automatic mode

See interpreter.Interpreter.predict().

I/O methods

The Interpreter can be saved and loaded from files using the following methods. Please note that the interpreter.Interpreter.load() method is a classmethod and must be called statically.

Interpreter.save(outfile)[source]

Save model to output file.

Parameters:: outfile (string) – File to output model.

classmethod Interpreter.load(infile, context_builder=None)[source]

Load model from input file.

Parameters:

infile (string) – File from which to load model.
context_builder (ContextBuilder, optional) – If given, use the given ContextBuilder for loading the Interpreter.

Returns:

self – Return self.

Return type:

self

Example:

from deepcase.interpreter import Interpreter
interpreter = Interpreter.load('<path_to_saved_interpreter>')
interpreter.save('<path_to_save_interpreter>')