Referenced in our Newsletter Volume 5, Issue 3 - August 2006
Money Laundering: The Exception
There are many factors that must be considered when analyzing data, including the reliability of sources, quality of data,
different formats, security levels, creation date, misspellings, inconsistencies, and more. What may appear to be a
great pattern can actually be less than ideal due to data discrepancies. Please remember, there are always exceptions to
the pattern and there are often exceptions to the exceptions. The goal is to detect and expose potential targets of interest
and then drill-down and interpret the results before making a final decision.
The following example is based on Suspicious Activity Reports (SARs) that are filed by worldwide banking and finance systems
to their respective Financial Intelligence Units (FIUs). The database was queried to show all Social Security Numbers (SSNs)
that are connected to multiple SUSPECTS. The initial target of interest, shown below, represents a single SSN with two
different SUSPECTS. Occasionally, a SSN will be shared by a husband and wife in certain types of financial transactions.
In this case, the names are not similar, so the investigators consider these people unrelated.
One important factor to note about this diagram is that the SSN label depicts a "NO," which means it failed to be properly
validated using the Social Security Administration's authentication algorithms. Ultimately, this means the SSN is a fake number
(for more information on the algorithm used, visit the white paper section of our support site).
At this point, the investigators need to consider the validity and certainty of the pattern. From here, they want to know why
both SUSPECTS are using the same SSN. The network is expanded to show the ID NUMBER for each, as shown below.
As suspected, their driver's licenses are different. Next, the investigators want to check the PHONE numbers listed on the SARs.
The premise being that a common phone number or a shared driver's license in conjunction with the SSN would guarantee a strong
connection between the two SUSPECTS. The results are shown in the diagram below.
Yet again, there is no additional overlap. The next step is to look at the ADDRESSES of these SUSPECTS. Addresses are perhaps
the most widely varying data encountered in any system. There are many abbreviations, spellings, and formats used to encode an
address. It is not unusual to see 3, 4, or 5 variations of the same ADDRESS—often differentiated only by extra periods,
commas, or directional encoding (e.g., NW, N., or North).
For the two ADDRESSES shown in the diagram, the investigators quickly see they are not even close to one another. If they
were in the same CITY or STATE, there would be more of a chance that the SUSPECTS were related. Unfortunately, these two
addresses are more than 1,300 miles apart from one another—which dramatically diminishes the likelihood they are related.
Finally, the financial transactions are displayed in the network. As shown in the final diagram, each SUSPECT has only
a single, unique transaction (SAR). This tells the investigators that the SUSPECTS are not actively engaged in multiple
transactions. The investigator can then safely determine that the common SSN is most likely a data entry problem and the
entire network can be discounted. If the SUSPECTS each had more than one transaction, it would be highly unlikely that the
same transposition would occur for every transaction. If that were the case, the investigators would aggressively pursue
these SUSPECTS.
What looked like a promising pattern quickly deteriorated into a review of "bad" data. Often times, especially with numbers,
they can be easily misrepresented where 2's look like 5's, 4's like 9's, or 1's like 7's. All too often, these simple
transpositions can result in more complicated analytics.
In this example, all of the details for the SUSPECTS can be presented in one step; however, the interpretation of each entity
was important, therefore each was introduced one at a time.
|