Paradise Papers (A Tutorial)
Explain Yourself, Cas
For years now I’ve harped on about fraud — from the curiosities I see throughout the stablecoin space, to historical recollections of previous corporate malfeasance. Today, however, I’d like to provide readers with a small, free, learning opportunity: how to utilize the Paradise Papers to mine interesting publicly available data.
History Lesson
The Paradise Papers arrive to the general public on November 5th, 2017. The data leak is provided by an anonymous source and released by Süddeutsche Zeitung, then shared with the ICIJ (International Consortium of Investigative Journalists). The leaks consist of a slew of data from Appleby (an offshore legal firm headquartered in Bermuda), Estera (an Appleby subsidiary), and Asiaciti Trust (another offshore corporate legal firm). The leaks prove to be important because they display how the wealthy, elite, and financial criminals move money. The fallout is severe.
Requirements
First of all, you’re going to want to get familiar with some basic computer skills if you don’t already have them. This process will require hardware and software before you can even consider getting started.
- An external hard drive. This is a good investment always, but you’ll be downloading a considerable amount of data (almost a gigabyte) and should have a devoted folder that doesn’t rely on your computer.
- A VPN. VPN stands for “Virtual Private Network” and provides more privacy than surfing the web naked. Is it pivotal for this? No. But worth using always.
- Excel or a similar free spreadsheet program. This is ultimately the main requirement to interpret the Paradise Papers. If you aren’t familiar with how to use a spreadsheet program, I recommend scanning YouTube for some beginner tutorials — that should do the trick.
Tutorial
The first step to scanning the Paradise Papers is to download them. Plug in your hard drive and type https://offshoreleaks.icij.org into your web browser. It should take you to this page that gives you a warning about how to interpret the data you’ll be viewing:
In the upper right corner, you’ll notice a link that states, “DOWNLOAD”.
Clicking this will take you to another page where you are offered four separate folders: Bahamas Leaks, Offshore Leaks, Panama Papers, and Paradise Papers. It will also urge you to download “Neo4j,” which is the data visualizer of choice for the ICIJ. This program comes with a tutorial of its own, but know that you can utilize the Paradise Leaks, regardless. See below for the page:
The Spreadsheets
Once you’ve downloaded the almost gigabyte of .zip files, it will appear like so on your hard drive:
The categories are as follows:
- Edges
- Address
- Entity
- Intermediary
- Officer
Which spreadsheet you’ll open will depend entirely on what you’re searching for. For instance, if you know the name of an individual who founded or is on the board of directors of an offshore company, you’ll open the, “Officer” spreadsheet. If you know the name of the company, but no founders or directors, you’ll open, “Entity”. Intermediary and edges won’t prove useful until you know an officer, entity, or address.
Examples
Using the “Officer” spreadsheet I will show you how this data can shed light on an individual’s corporate history. Today we’ll be using the well known, “Charlie Shrem,” as an example of how to perform a search.
I have little knowledge of Shrem’s past and who he is. I know his last name is Shrem, regardless of if he goes by Charles or Charlie on file. I know he spent time in Panama and ran a company with Erik Voorhees. So, let’s open up the Officer spreadsheet in the Panama Papers folder and see what we find out.
The spreadsheet will take a second to load and the data will look kind of incomprehensible and intimidating at first glance. Don’t worry, we know what we’re looking for. Here’s how the Officers spreadsheet appears:
The categories are as follows (all may prove to be important to you):
- node_id (this relates to the spreadsheet titled, “Edges”)
- name (this refers to the individual or trust)
- country_codes & countries (this refers to the country of origin for the establishing director — not necessarily their place of birth)
- sourceID (should almost always simply state the name of the leak you’re viewing)
- valid_until (almost always says “current through 2015” unless otherwise noted, then will state, “manually added”)
All we currently know for sure is that we’re looking for a “Shrem.” With this in mind, we’re going to search the spreadsheet for this term and see what comes up (do this by pressing command+f or ctrl+f or the “Find” option under the “Edit” toolbar). We get two Shrems.
Of course, “Charlie Shrem” is exactly who we were looking for. This doesn’t rule out that Itschak Shrem could somehow be related, but odds are low, and for now we’re not going to look into it. Instead, we’ll take the new data we’ve been provided for Charlie and use it.
The node_id for Charlie is 12224871. We’re going to open the “Edges” spreadsheet for the Panama Leaks and enter that node_id. We’re left with new numbers, but no new interpretable data:
What this means is that Shrem is a shareholder of “10150369” (the entity) and the address of the entity is, “14083806.” We move on to two new spreadsheets, the one entitled, “Entity,” and the one entitled, “Address.”
The Entity spreadsheet has a new set of headers, as follows:
- node_id (in this case, 10150369)
- name (the name of this entity)
- jurisdiction (code for where the entity is legally culpable)
- juridiction_description (name of the jurisdiction)
- country_codes (code for where the entity was incorporated)
- countries (name of country where entity was incorporated)
- incorporation (date of incorporation)
- inactivation (date when the entity went inactive)
- struck_off (date when the entity was struck off the books legally)
- closed_date (date of closure)
- ibcRUC (a unique identifier specifically for the Neo4j program)
- status (active, defaulted, shelf company, changed agent, etc)
- service_provider (firm that helped establish the entity)
As you can see, as we connect more dots the information we’re accumulating seems to exponentially increase. We’ll search for 10150369 in the Entity spreadsheet and this is what we’ll be given in return:
We’re now swimming in information for Charlie: Shrem helped found a company called, “QUAINTON ASSOCIATES CORPORATION.” It was legally bound to the British Virgin Islands, but Mossack Fonseca helped Charlie establish the company in Panama. QUAINTON was founded on the 12th of February, 2013, and declared inactive on the 6th of November, 2014. It defaulted.
Additionally, we’ve yet to enter the address information in the Address spreadsheet. Let’s see what that provides:
- node_id (in this case, 14083806)
- name (blank)
- address (self explanatory)
- country_code (code for country of origin of the entity)
- countries (the country the code is identifying)
For QUAINTON this is what comes up:
It’s a bit of a tangled mess, but if you look past that, you get PH Cristal Park; Oficina №1; Panama City; Republic of Panama, which if typed into Google maps gives you this:
The Conclusion
You can see that with little more than a first and last name we were able to determine Shrem’s company name, where it was legally bound, who helped incorporate it, where it was incorporated, that it defaulted after a year in existence, and, lastly, what the physical address looked like. This is an incredible wealth of data that was returned from very little to begin with.
Thanks for reading through this. I’ll be doing a second tutorial revolving around the data sifter Neo4j in a week or two and how it can help you visualize the vast sums of overwhelming info the leaks provide.