July 2003  Volume 2, Issue 7   
What's New?

VAI shipped our first maintenance patch for version 2.0. Maintenance patches generally focus on bug fixes and minor enhancements. This version, which we've numbered 2.0.1, addresses a number of reported issues (see BCD Speaks) and includes some minor enhancements. See our Support Site to download this update.

Look for one more maintenance patch in the next few weeks and the next major release (VisuaLinks 2.1) later in the summer. Version 2.1 will include a number of key features that we'll discuss in the coming months: Parallel Coordinates Placement, Social Network Analysis and Terminate Query, among others. Check here in coming issues for more information.
Here are a few tips and tricks to help you use VisuaLinks quickly and more efficiently.

Data Control Panels
From the Data Control Panels on the left (Query Results, Favorites, Details), you can print selected items:
  Select the items you wish to print (click Select All to select all rows)
  Use the Output menu to choose Print/Create Image
  Proceed as with other print functions

Results Sets
You can manage results sets, including renaming the current results set, using the Information Set icon in the View Toolbar.
Instant Message Feature
VisuaLinks includes an instant message feature. To send an instant message:
  Select the Instant Message tab
  Select a user from the list of users in the Users Panel
  Type your text in the message area
  Click Send in the lower, right corner

Details Panel
You can selectively build a list of details in the Details Panel by changing the operation of the Details mode:
  Click the triangle in the lower left corner of the Details icon
  Select Accumulate Details and click Close
  Objects you click or select will be accumulated in the Details Panel, selectively building a list of objects. To return to normal details operation:
  Click the triangle in the lower left corner of the Details icon
  Select Show Details and click Close
Summarize
The Summarize feature has been available in VisuaLinks for quite some time and has gone through several enhancements with each release. Summarize is very useful for getting breakdowns and counts for various values in a dataset. Summarize can be compared to traditional OLAP (online analytical processing) features often offered for data warehousing applications. The Main Settings tab on the Summarize service requires that you select the model you want to summarize along with a single object type. Once an object is selected, the "Count" column is populated with the object's corresponding attributes. Any number of attributes can be counted, though it helps to use a value that is uniquely defined such as a transaction-ID, a control number, full name, or ID number. Once an attribute is selected, the optional "Group By" column becomes active to provide more resolution for the query. Basically, Summarize will count the number of unique values that occur in one field based on the observance of a value in another field. When forming a Summarize query, think of the following statement "Show me the count of how many X values exist" or with the Group By defined, "Show me the count of how many X values exist for each unique Y grouping." Examples of Summarize Count and optional Group By selections are shown below:
  Show count of all PHONE CALLS
  Show count of all STATES
  Show count of all FORM TYPES
  Show count of all SUBJECTS group by ZIPCODE
  Show count of all TRANSACTIONS group by DATE
  Show count of all DRIVERS' LICENSES group by SUSPECTS
  Show count of all TRANSACTIONS group by STATES
Figure 1 For example, in Figure 1, using the SAR-BSA data model (a sample model shipped with VisuaLinks), the object SAR-DCN has been selected. In the Count column, the attribute DCN (which stands for Document Control Number) has also been chosen. This means that every unique value associated with a DCN will be counted. Since the DCN is always unique in this dataset, the result of a Count will be the total number of records that exist in the data. Notice there is no Group By defined for this example.

In this example, the result of the Summarize query indicates there are 48 unique DCNs in the SAR-BSA database.


The system will always return the TYPE and the COUNT for each column selected. This is a very efficient way to get quick counts for the number of values that appear in various fields (attributes) contained within the database.
Figure 2
In Figure 2, the Count is still set to the DCN and the Group By is now set to OCCUPATION. The desired outcome is to show how many times each occupation listed was used in a financial transaction. The first 10 results are shown below, where 9 different DCNs have listed CEO as the OCCUPATION followed by 6 for JUDGE, and 6 for NULL value.

In Figure 3, additional Group By conditions have been added to include the STATE and YEAR associated with the SAR-DCN. After running this query, we see that each unique occurrence of OCCUPATION-STATE-YEAR value provides much greater resolution on what is happening within the dataset, albeit smaller results for each unique set of values. In this database there are 2 counts of MOVIE PRODUCTION occupations in the state of CA for the filing year 2000. Typically, the more Group By conditions used with Summarize, the more values returned since there are larger "combinations" of Group By values for each Count defined.

Figure 3
Any entry returned for a Summarize query can be used to reconstruct the data objects that reflect the particular conditions selected.

One final note: the Apply Distinct option should be used when there is a chance that multiple values for the Count attribute may be present in the database. This may occur when using database Views because many-to-many relationships often replicate values, especially when Outer-Joins are invoked. In order to get a true count for the number of unique values in the database, this option should be used.


There is a caveat one should remember when using Apply Distinct. This feature throws an error when used with Microsoft® Access databases [.mdb files] since Microsoft does not support this feature. The administrator of your system should be aware of this limitation and disable the feature in the model definition.

Temporal analysis is the topic of this month's newsletter. You may recall in the May 2003 newsletter, there was a discussion of behavioral analysis with a specific emphasis on temporal patterns. This was a high-level "introductory" overview of the two primary forms of temporal patterns, relative and absolute. Additionally, in the January newsletter, the "Link Chart of the Month" featured a temporal grid (using absolute placement) depicting financial transactions. This month, we delve deeper into the distinctions between the two types of temporal patterns and provide you with a number of examples.

When detecting temporal behaviors, one must reflect on the type of data that are available for supporting such patterns. Typically, we think about "transactional" events such as financial deposits and withdrawals, border crossings, credit card purchases, travel events, terrorist actions, narcotics dealing, telephone tolls, etc. The common thread between all transactions is that they support a time/date component, that is, they all represent some type of "event" that has occurred at a certain time on a specific date.

Usually a single transaction is not significant, however, when all transactions are viewed collectively for a specific type of data (e.g., a phone number, a credit card, an account) we can infer behavior based on how the transactions occurred. Viewing transactions in the context of other transactions can lead to some very interesting results. The "trick" is knowing how to define the context and how to interpret the results.

Keep in mind that every transaction can be considered separate and distinct from every other transaction in the world. It is the common element (e.g., the account, the phone, etc.) that ties the transactions together. Usually, the "face-value" of the temporal data contained in a transaction represents a single point in time or a single date. Using only the "raw" time/date has limited value since the best we can accomplish is to see which transactions happened on the same date.

The VisuaLinks diagram in Figure 4 shows clusters of transactions that occurred on the same date. Since this is a "relative" placement, we know that each cluster occurred at a later date (clock-wise, starting at the 12:00 position), but we do not know if a day, week, month, or year has transpired between the each successive event, only that each occurred at a later date and/or time. The only noteworthy item in this diagram is the cluster near the 2:00 position that represents two transactions that occurred on the same date. Short of drilling-down on each object to look at the different dates, there is little more we can do with the temporal aspect of these transactions using relative placement.


In Figure 5, the same data is shown using a traditional timeline placement technique. However, it is still hard to decipher what types of behavior or activities are hidden in these transactions. Although we have introduced more of an "absolute" layout in the timeline placement, it is difficult to see the patterns. Generally, this layout can help show common sequences between various events such as transactions generally occurring during similar periods of time.

To be effective, other dimensions of the date/time must be utilized to help expose patterns and trends. This leads into an entirely new topic area related to meta-data extraction. "Meta-data" is "data about data" - data that explains attributes and features of other data. We'll cover meta-data in detail in a future issue. For now, however, date/time values contain a number of additional dimensions (meta-data) that can be calculated to support the analyses. In the timeline placement shown above, the x-axis was assigned to represent the month of the transaction. To derive this information requires a very simple calculation. Here is an example of the other types of date-related meta-data that can be extracted and used for analysis:
Figure 6

  DOW-Day of Week
  DOM-Day of Month
  DOY-Day of Year
  DOQ-Day of Quarter
  WOQ-Week of Quarter
  WOY-Week of Year
  WOM-Week of Month
  MOY-Month of Year
  QTR-f-Quarter Fiscal
  QTR-c-Quarter Calendar
  Season/Holiday/Leap
  Year

These additional dimensions can be exploited to relate the transactions to a particular pattern. For example, the following sequence of diagrams show how these dimensions are used when placed into the Temporal Grid (one of the placement algorithms in VisuaLinks). The X-Axis represents the DAY-OF-WEEK (so there will be 7 entries, plus a null column) and the Y-Axis represents the WEEK-OF-YEAR (52 entries, plus a null row). The data presented above is now shown in a temporal grid:

Using the Temporal Grid format (as shown in Figure 6), you can see the most favored DAY-OF-WEEK is Monday (8 total). Using this layout it also shows no activity in the middle rows of the grid, which equates to the summer time-frame. These transactions tend to happen in a somewhat random order and have very limited "repeating" temporal behavior (except for the Monday correlations). Given the fact that these are financial transactions, it leads us to believe the associated organization is not involved in legitimate dealings.

The following VisuaLinks diagrams show a number of other types of temporal grids using the same general layout format. Each of these diagrams was generated using financial data filed on people/companies involved with money laundering activities. A short description follows each diagram.

Figure 7 The temporal pattern shown in Figure 7 is extremely regular and clearly shows a bias for monthly filings. In the first half of the year, most of the filings occurred on a Monday. This pattern changed abruptly during the summer to a different day. It was determined the pattern shown is when the filing institution was submitting their suspicious transaction reports (as consolidated reports) rather than each separate violation. This pattern is indicative of a "compliance" training issue for improper transaction reporting.

The pattern shown in Figure 8 is based on "2 weeks on - 2 weeks off." Every 2 weeks, the pattern comes on strong from Monday-Friday (no weekend activity) and then drops off the following 2 weeks. Additionally, the filings on Monday tend to represent several transactions. This is a highly unusual pattern such that it can't be traced to any legitimate business operations that would behave in such as fashion.

This pattern in Figure 9 is based on an individual who frequents casinos. What is unique about this pattern is the bias for weekend and holiday gambling. The first column is Sunday and the last column is Saturday which is why there is a gap during the middle-part of the week. It is believed this person is gainfully employed because it is focused on weekend and holiday time periods.


Figure 8Figure 9
Figure 10 is based on financial transactions associated with a convenience store. Similar to the casino data shown previously, this convenience store prefers moving money on Mondays and Fridays. Although this type of behavior could be explained as normal business activity (weekly earnings, check cashing, etc), the dollar volume associated with these transactions is not consistent with the type and location of this particular store. Also notice, the pattern abruptly stops around August, yet the store is still in business - so something drastically changed their behavior.

Figure 10 Keep in mind, that time (e.g., HOUR-OF-DAY) can also be used to analyze temporal patterns.

In Figure 11, the x-axis is set to HOUR-OF-DAY (24 columns) and the y-axis is still set to WEEK-OF-YEAR (52 rows). In this layout you can see that this target (a vehicle crossing a land-border port-of-entry) shows a very distinct "commuter" pattern around 7:00 am. There are subtle variations to the pattern (e.g., 5:00 am to 9:00 am), but for the most part, the vehicle is crossing consistently each week. We also notice that time was taken off during the summer, perhaps for a vacation.

Figure 11 These absolute temporal patterns are extremely useful when the pattern fits a known cycle-of-convenience (e.g., hours, days, weeks, months, etc). However, other patterns can be exposed based on the co-occurrence of events - such that there is no common element between the events except date/time. We classify these as "relative sequential" temporal patterns. Of course, the detection of these kinds of patterns in the "real world" can be coincidental or intentional.

For example, if we are analyzing the border crossings at a busy port-of-entry along the Southwest border and are looking for cars carrying drugs (load-cars) being escorted by spotter-cars, we would want to focus on the co-occurrence of crossing between various vehicles. However, more often than not, we may find an unusual number of relative sequential patterns in the data during the rush-hour commute. Here we see a large number of vehicles consistently crossing at the same time every day, even though the crossings are unrelated. Therefore the pattern tends to be less "of interest" because of the number of false positives associated with the volume of crossings during this time.

Detecting relative sequential patterns can also be accommodated with VisuaLinks using the "Relative Time" service. This feature searches for repeating related occurrences within an information set based on static time differences between events. It utilizes dynamic trigger event keys and allows the user to specify the delta-cycle (difference in time periods) in which each event is observed. Although it sounds complex, it is actually a very easy subsystem to use once you understand the parameters necessary to configure the search algorithms.

This simple introduction to temporal patterns only touches the surface of what can actually be done in this arena. Research is constantly being conducted by Visual Analytics to help make detecting patterns based on time more accurate, reliable, and easy to incorporate into everyday analytics. Additionally, "time" is analogous to "space" such that we expect many of these concepts to apply to detecting spatial patterns within data sets - based on alignments, position, and orientation. Designs already exist to support additional types of temporal patterns in future versions of VisuaLinks.
Possible Domestic Terrorist Shooting

The following scenario illustrates the use of VisuaLinks in a suspected domestic terrorism incident. The incident under investigation was a shooting at a large religious institution. After the incident, the only evidence, besides some vague witness accounts, was a retrieved bullet. The bullet was submitted to the FBI for processing. The FBI found the weapon in its ballistics database and returned a make, serial number and purchase information to the investigators. When the gun owner was questioned, he indicated he had reported the gun stolen, but hadn't yet filed a police report.

Investigators queried a local database on the weapon serial number and conducted a first-level walk, yielding the following image:

Figure 12
By walking the data in Figure 12 to a database of subpoenaed financial and telephone records, we see with whom our target has been in contact and his banking transactions two weeks prior to the incident (Figure 13).

Figure 13

Due to the nature of the incident, investigators suspected that the incident had domestic terrorist basis. The following result was against additional databases of known domestic terrorist groups, gang activities and associations. The results are shown in Figure 14:


Figure 14
As can be seen in Figure 14, our target (top row, connected to the original weapon), shares not only a bank account with persons in the domestic terrorist and gang database, he has also made telephone calls to other individuals in the database. Analysis revealed that PERSON 2 made a large cash deposit into the account used by our target. Most telling, however, is that both the TARGET and PERSON 2 were found to be closely affiliated with a known domestic terrorist group (GROUP).

Investigators concluded that TARGET had purchased the weapon and likely sold it to PERSON 2. For this reason, they suspected PERSON 2 was the shooter and extended the investigation to include him.
To unsubscribe from this Newsletter, use the unsubscribe option in your support site profile. If you do not have a profile, please e-mail support@visualanalytics.com to unsubscribe.
Digital Information Gateway Shared Components for VAI Products VisuaLinks - Best of Breed Product Visual Analytics Home Site The Linkletter Archives Visual Analytics News Digital Information Gateway Software News VisuaLinks Software News The LinkLetter Cover Home of Visual Analytics The LinkLetter Archives Home of Visual Analytics Visual Analytics Inc. Digital Information Gateway VisuaLinks - Link Analysis, Data Mining Tool Visual Analytics Home Site The LinkLetter Newsletter Archives Visual Analytics, Inc. News Digital Information Gateway Software News VisuaLinks Software News The LinkLetter Cover