Part I: An Introduction To OSINT Research For Protective Intelligence Professionals
Our two part series is divided into the following segments: Part I, in which we highlight the context in which protective intelligence professionals use open source intelligence (OSINT) and Part II, a brief overview of OSINT collection methods and how they relate to our organizations’ protection strategies.
What makes OSINT research so important to protective intelligence professionals is that it’s not only a tool for us to use for good, in collecting information to support decision makers, but also a potentially malicious tool to be used against us (and our organization’s assets). Having a basic understanding of OSINT supports our proactive assessment of threats, and it also supports our assessment of OPSEC/information related vulnerabilities of our own organizations. In understanding OSINT collection methods, we not only better understand how to protect our organization’s sensitive information, but we also stay up to date with potential access points that an adversary may infiltrate and exploit.
We will begin our introduction with an overview of the intelligence cycle, then move on to important background information about the internet.
OSINT & The Intelligence Cycle
I will let one of the authorities in this area define OSINT for us. In “Open Source Intelligence Techniques” 5th Edition by Michael Bazzell, he cites the following definition:
“Open Source Intelligence, often referred to as OSINT, can mean many things to many people. Officially, it is defined as any intelligence produced from publicly available information that is collected, exploited, and disseminated in a timely manner to an appropriate audience for the purpose of addressing a specific intelligence requirement. For the CIA, it may mean information obtained from foreign broadcasts. For an attorney, it may mean data obtained from official government documents that are available to the public. For most people, it is publicly available content obtained from the internet.”
You may find it helpful to consider that definition of OSINT in conjunction with the following definition of intelligence:
“Intelligence deals with all the things which should be known in advance of initiating a course of action.” [1]
These definitions are easily understood in the context of the intelligence cycle (pictured below).
Here’s one simple explanation to describe the intelligence cycle:
The intelligence cycle begins with direction from a decision maker. Analysts then define the problem on a granular level and develop a specific plan. Planning is an essential step because the scope of the research needs to be identified, the objectives of the research need to be evaluated in terms of feasibility (resources/time), and information to be collected (and its sources) needs to be identified.
While collection, processing, and analysis are self-explanatory – gathering the information and using inductive/deductive reasoning to conclude what the information means for decision makers, production on the other hand is very much dependent on the consumer of the intelligence product. For informal intelligence products that stay within the department, it could be as simple as a 300 word email summarizing the analyst’s conclusions, or for a Fortune 100 client’s HR/Legal team it may be a 30 page document with methodologies, supporting documentation, and more. Lastly, analysts need to seek feedback from the consumer (end user) to learn how to better serve their goals and needs going forward.
Facts About OSINT
What makes OSINT so relevant is the proliferation of mediums and users across the web, giving analysts evermore useful information to collect from an increasing number of sources. This can be demonstrated with a review of the historical growth of the largest current social media platform, Facebook. The number of active Facebook users in 2004 was approximately 1 million, 8 years later it had grown to 1 billion, and 5 years after reaching 1 billion, they hit 2 billion in 2017. And to provide you with even more context, we would like to cite a statement attributed to Eric Schmidt in 2010 (former CEO of Google):
“Every two days now we create as much information as we did from the dawn of civilization up until 2003… That’s something like five exabytes of data,” he says.
Source: TechCrunch
The content that Eric Schmidt is referring to is the same content (Facebook posts, Tweets, Instagram updates, etc.) that analysts and investigators are collecting to support their security intelligence programs.
Now that we have seen two examples of the proliferation of social media use, let’s examine some quick facts about the most popular social media platforms and key judgements from the Pew Research Center’s recent report: “Social Media Use in 2018.”
Monthly active users per social media platform (approximation):
- Facebook: 2 Billion
- YouTube: 1.5 Billion
- Instagram: 700 Million
- Twitter: 328 Million
- Snap Chat: 255 Million
*Source: Tech Crunch
Conclusions from Pew Research Center: (United States) “Social Media Use in 2018”
- “Roughly three-quarters of the public (73%) uses more than one of the eight platforms measured in this survey, and the typical (median) American uses three of these sites. As might be expected, younger adults tend to use a greater variety of social media platforms.”
- “Roughly two-thirds of U.S. adults (68%) now report that they are Facebook users, and roughly three-quarters of those users access Facebook on a daily basis.”
Context: These excerpts come from “Social Media Use in 2018” by the Pew Research Center. In this document, Pew Research included the following social platforms in their study: Facebook, YouTube, Twitter, Instagram, Pinterest, Snap Chat, LinkedIn, and WhatsApp. The report was published on March 1, 2018.
Defining Collection Sources: Surface Web, Deep Web, And Dark Web
The term surface web describes those parts of the internet that have been indexed/crawled by search engines such as Google, and are thus searchable via various search engines. The deep web is the remainder of the internet that has not been indexed by search engines. Generally, it is accepted that about 95% of the internet is the deep web, while about 5% of the internet is the surface web (see iceberg image below).
So, what parts of the internet are not indexed by Google and the other major search engines? Generally, most online databases, communities, networks, etc. are not indexed by Google. Example – The most popular photo sharing website on the internet, Flickr, has over 1 Million public photos uploaded to it PER DAY. Repositories of information (in this case photos) such as these are generally not indexed by search engines. In addition, website owners can add code into their webpages that instructs search engines to specifically NOT index their pages (Google the phrase “Robots.txt” to learn more).
Lastly, the dark web is just a subset of the deep web, with two general characteristics: (1) they are websites that are not indexed by the major search engines and (2) they have to be accessed by using special software, methodologies, and related permission layers.
The Nuances Of Internet Browsing
Now that we have discussed the broad categories of the internet, we can consider some of the details relating to internet browsing: internet cookies, cache, and IP addresses. Each of these elements is relevant to our mission because they potentially undermine an investigator’s privacy, as well as the integrity of their investigation.
Basically, Cookies and cache both work to make the user experience in browsing the internet seamless. Cookies are pieces of data sent from a website to the user’s browser, and saved within the browser. As an example, cookies may store your shopping cart information or remember that you’re already logged into a website. The cookies that OSINT researchers are most concerned with are tracking cookies used by websites to monitor browsing activity.
Cache on the other hand, are pieces of data stored in the browser such as image files, to speed up the loading of web pages. While not as critical as cookies, cache still leaves a trail of data within your browser that documents your browsing/investigative activity to a degree.
Next, it’s also helpful to note that when a user typically connects to a website, the website can see the user’s IP address (because without the IP address, the website would not know where to send the data). So, what information can others gather about you based on your IP address? How about these: your internet service provider (ISP), country, city, and more. To illustrate this point, here is a simple example of how cookies and our IP address can compromise our privacy: If Facebook is tracking our online activity and the IP addresses we use to connect (our geographic locations), then it can use this information to suggest friends to us, target us with ads, and more.
A quick word on Virtual Private Networks (VPN): a VPN is a tool that allows computers to connect securely over the internet. We can think of it this way, a VPN is like a secure tunnel for our data to be sent/received. The VPN encrypts the data traveling between point A (you) and point B (the Internet), all while obscuring your actual IP address.
Concluding Part I
In this part of our introduction, we discussed how OSINT fits into the intelligence cycle, the proliferation of social media use, and background information about the internet/browsing. In part II of this series, we will review concepts used by OSINT investigators, and relate those back to our own asset protection strategies.
Thank you for taking the time to dive into this introductory review of OSINT research. I have listed several resources below for those of you that would like to learn more. Enjoy!
- Article: “How I infiltrated a Fortune 500 Company With Social Engineering”
- Article: “Burgling From an OSINT Point of View”
- Complete Privacy and Security Podcast
- Bellingcat Blog
Additional Sources:
[1] Source: Commission on Organization of the Executive Branch of the Government [the Hoover Commission], “Intelligence Activities,” June 1955, p. 26. This was an interim report to Congress prepared by a team under the leadership of Gen. Mark Clark.
Author Credit: This article was written by the Protective Intelligence contributor, Travis Lishok.
Author Bio: Travis Lishok, CPP
The Protective Intelligence team is excited to feature a written piece from Travis Lishok, one of the newest Ontic team members. Travis has nearly 10 years of experience in public and private sector security, to include conducting intelligence research and supporting executive protection teams in GSOC operations. As a professional project, Travis creates protective security related content via EP Nexus, some of which specifically focuses on OSINT, travel risk, and related topics. As you’ve seen in this article, investigative research is a topic that he is enthusiastic about sharing, and that’s why we invited him to contribute this valuable piece.