How is machine learning relevant for protective intelligence professionals?

Suppose your work computer was capable of taking all of the threatening communications ever sent to your CEO, studying each communication which you had assessed (low/medium/high threat), and then being able to assess future threatening communications with all of the previous communications as a basis for its judgement. Essentially, the computer would be able to make generalizations (classifying communications as low/medium/high threat) about new threatening communications sent to the CEO, based on what it learned from studying the previous threatening communications and your human analyst’s assessments of those communications.

That’s a tall order to fill, right?

That capability is not out of our reach, and that is only one, narrow application of it. There are unlimited applications where machine learning can assist security and protective intelligence professionals.

Machine Learning or Augmented Intelligence (AI)?

AI is simply computers/programs, also referred to as machines, imitating intelligent (human) behavior. Machine learning is a subset of AI, where computers receive data and use it to learn from themselves.

You benefit from machine learning everyday. Here are examples that you are familiar with:

  • Email spam filter
  • Recommendations on Amazon / Netflix
  • Clustering search engines
  • Facial recognition / handwriting recognition

Defining Machine Learning

These are two highly regarded definitions of machine learning.

(Informal) “Field of study that gives computers the ability to learn without being explicitly programmed.”
-Arthur Samuel (1959)

And

(Formal) “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
-Tom Mitchell (Machine Learning, 1st Edition, 1997)

Let’s illustrate this academic definition by using the opening example about threatening communications.

  • Experience (E): the experience of reading multiple threatening communications
  • Task (T): the task of reading each communication
  • Performance (P): the probability that the computer will accurately classify the communication as low/medium/high threat

In plain English, the program learns from its experience of reading the threatening communications, and its accuracy in classifying communications improves with more experience.

Supervised and Unsupervised Learning

These are the two broad classifications of machine learning: supervised learning and unsupervised learning.

Supervised Learning

Supervised learning is when the computer is given correctly labeled data to learn from. Basically, you give the computer the “correct answers” and the computer studies these, using them as a basis for making a generalization about data outside of the initial training set.

Supervised learning can be broken down into two subgroups: regression and classification. Regression predicts a continuous output value, while classification gives a finite output value. An example of regression would be a program predicting real estate prices in a given city, while an example of classification would be a program to classifying emails as spam or not spam (only two possible outputs).

Example

Suppose you gave your computer 200 threatening communications to study. Each communication is made up of (A) text of the communication and (B) the intelligence analyst’s assessment of the communication (low/medium/high threat) and (C) Baseline information obtained by the threat assessment / security professional such as; address history, country of origin, known pre-dispositions for violence or stalking, etc. The computer can study these, then develop generalizations that it can apply to new threatening communications that it has not seen before. That’s supervised learning: we gave the computer data with the correct answers (low/medium/high threat), as well as background profile information on the subject, and it drew generalizations from these.

In contrast, unsupervised learning occurs when we give the computer data, but no correct answer/label. Basically, we give the computer data, then ask the computer to make sense of it (to give it order). Continuing with the same example above, if we gave the computer the data for our 200 threatening communications (this time, without the analyst’s assessment of threat level), the computer would seek to find patterns/features in the data. In this instance, the computer does not have training examples with the “correct answers,” rather it has to analyze the data on it’s own, based on features of the data, then give it order. That’s unsupervised learning: a computer learning from data, in the absence of training examples labeled with the correct answers.

Unsupervised Learning

Unsupervised learning occurs when the computer receives unlabeled data, then computer gives order to data that otherwise has no structure.

A simple example is clustering search engines, such as Carrot2. When you enter a word/name/phrase into the search bar, the program returns a visual image of associated terms, a cluster. The computer was not programed to give the specific visual that you see, rather it has learned to do it based on a machine learning algorithm.

The applications of machine learning and more generally, augmented intelligence, are unlimited in the protective intelligence space.  And these are key ideas that will be recurring themes on Protective Intelligence.

Thank you for reading this introductory article about machine learning. This only scratches the surface. For readers that are interested in learning more about machine learning, there are several highly regarded (free) courses offered by Stanford University and Columbia University online.

Author Credit: This article was written by the Protective Intelligence contributor, Travis Lishok.

Ready to unify your data and tools for a holistic view of threats?