September 2003  Volume 2, Issue 9   
What's New?

VisuaLinks 3.0 is just weeks away.

Yes, we said VisuaLinks 3.0, not VisuaLinks 2.1, as was previously reported. After much discussion and hand-wringing, we have decided that we've packed so much New Goodness into this upcoming release of VisuaLinks, that it constitutes a major new release of VisuaLinks. The next release of VisuaLinks will be Version 3.0.

We will pore over VisuaLinks 3.0 in more detail in the coming issues. In the interim, here is a short list of some of the important new features:
  • New Dynamic Memory Management - VisuaLinks will be able to process results from extremely large databases without requiring more computer memory
  • New Shared Directory Storage - users will be able to personalize their VisuaLinks configuration and settings, including personal alias, hit, exclusion and search history lists as well as many other settings
  • New Social Network Analysis and Parallel Coordinates placement algorithms
  • New Presentation Graphics - this will allow users to annotate their link charts for presentation and distribution
  • New Basic and Advanced User Interfaces - users will be able to choose a user level setting that controls the number of features and services displayed in the interface
  • New reporting interface to Crystal Reports Enterprise
  • New customizable toolbars and menus
  • Improved, and substantially revamped, user interface
  • Improved Collaborative Analysis - improved the Server-to-Server functionality, improved Instant Messaging, and enhanced real-time data sharing
  • Improved Mapping and GIS features
This release of VisuaLinks will once again raise the bar for the data visualization and analysis industry.

In the midst of all of the 3.0 activity, we are planning a maintenance release, version VisuaLinks 2.0.2, just around the corner. We'll notify you of both releases when they become available.

Here are a few tips and tricks to help you use VisuaLinks quickly and more efficiently.
Multiple Views
You can create multiple Views of your current information set. Each View can have a different placement algorithm applied to it.

You can create a new View in two ways:
  • From the Window menu (in the main menu bar), choose "Add A New View."


  • OR
  • In the Data Toolbar, click the icon (immediately to the right of the VisuaLinks icon). In the menu that appears, choose " Add A New View."
In both cases, a new View tab is added at the top of the data view area. You can apply different analytical processes and placements to each View.

You can quickly change the current View's placement settings by clicking the small blue triangle ( ) in the lower, right corner of the Render icon in the View Toolbar. From the menu that displays, select a new placement mode for the current View.
Objects Panel lists

The Objects Panel lists each of the objects in each View. The list can also reflect the placements applied to each. You may need to change from a "flat" view of the list to a "tree" view using the panel's Detailed View option in the Functions menu.


You can close a view a couple of different ways:

  • In the Data Toolbar, click the icon (immediately to the right of the VisuaLinks icon). In the menu that appears, choose "Close View Session."


  • OR

  • Right-click in an open space in the View you want to close. In the menu that appears, choose "Controls," and then " Close View Session."
Network Miner
Network Miner is very useful for identifying high-level relationships (organizations and activities) based on low-level data (people, places, things, and events). "Organization" is a set of interrelated people and things with some internal cohesion (e.g., a business or terrorist cell) and an "activity" is a set of interrelated people, places, things, and events with some internal cohesion (e.g., the development of a new product or the planning of a terrorist attack).

This is an example of data that can be used for money laundering detection. Each of the activities - deposits made by Tom, pizza sold by Nirula's and the import/export business conducted by Khakhi Imports - could be legitimate, but their occurrence in this particular combination can suggest that additional investigation might be warranted.

The key facts in this example are relationships among several persons, businesses, bank accounts, and deposits. The individual records are essentially meaningless by themselves. Only the network of relationships among the people, places, things, and events forms a meaningful pattern. This is clear to nearly everyone based on what has appeared in the news media since the September 11th attacks. Headlines talked of "terrorist networks" and "links" between individuals and known terrorist groups. The stories talked of meetings, financial transactions, and familial ties.

Network Miner searches your data sources for clusters of related information. It searches for information linked together by one or more types of associations.

Network Miner can be started from the Services Center window:
The Network Miner window, shown below, lets you choose the model and type of association you want to search.

This example uses the SAR-BSA data model (a sample model shipped with VisuaLinks), to analyze the DCN-ACCOUNT and DCN-SUBJECT associations.

In addition to this basic type of request, Network Miner can also:
  • Report the lowest and highest values found for any data type (i.e., string, date, numeric) as well as averages and sums of numeric data values.
  • Filter results based on attribute values. Bear in mind that filtering a Network Miner request on a large data source is resource-intensive. These request types should be executed when there is a lower demand on the Visual Clarity Server. You might consider running such requests during off-peak hours.
It is recommended that you increase the Min Group Size, which specifies the smallest number of objects that make up an individual network. The default (2) will return the most basic of relationships with potentially overwhelming numbers of results. Increasing the minimum size of the network provides better results and faster response times.

Continuing with our example, the result of the Network Miner query returned a total of 30 networks, as shown below.


The *COUNT OF OBJECTS column tells you how many objects exist within each network found.

To view the objects and association in a particular network, select a row in the results list and click the "Use in View" icon.

For this example, we selected row 7 (with a count of 14 objects) which yielded the network shown below.

Network Miner is a powerful tool that helps assemble data about relationships to provide a picture of higher-level organizations and activities. These, in turn, can form the basis for watch lists, indicators, and warnings of terrorist attacks and suspicious activities.



Server-to-Server
VisuaLinks Server-to-Server allows two or more VisuaLinks servers to share data and models. When configured correctly, Server-to-Server users at one location can access the data at another location. The connection is nearly seamless to the end user. Users can, within the bounds of security limits, query data from their local models and databases, and then walk that data to the remote server. Conversely, a user can start by querying the remote server and then walking data locally. These processes can be intermixed as needed, mixing and matching models locally and remotely.

Configuring VisuaLinks is relatively easy, requiring a handful of changes on both of the VisuaLinks servers involved in the connection. However, the network and firewall connectivity issues could be complex if the remote connection is made over the Internet.



In the sample configuration shown above, there are two firewalls at each location. Each of these firewalls may require configuration to enable VisuaLinks Server-to-Server connections. Also, in the configuration shown, VPN must be configured between the two corporate networks. Other configurations are possible, some of which could be more complex than this. Others might be less complex. For instance, on a secure corporate Wide Area Network, additional security may not be necessary.

In a Server-to-Server environment, the VisuaLinks user interface automatically adjusts to allow users to access the additional servers.

Figure 1 shows the Service Center window in a Server-to-Server configured environment. Note the additional tab for the remote VisuaLinks domain.


This next example, Figure 2, shows the Services menu in a Server-to-Server configured environment. Note the additional menu entry for the remote VisuaLinks domain.


Figure 3 shows the Database Query window for a remote domain. When executed, it will function exactly the same as a local Database Query.


Finally, Figure 4 shows the Data Walk confirmation window for a remote domain. When executed, it will function exactly the same as a local data walk.


VisuaLinks Server-to-Server is a powerful collaboration feature that could pave the way for unprecedented levels of data sharing.



Name Matcher

The Name Matcher searches your data for names, or other words, that are similar but not exact. For example, consider the name "Kathy Smith." When using the Name Matcher service, VisuaLinks would consider "Cathy Smyth," "Kathie Smythe" and "Cathie Smithie" as matches. It does this by comparing phonetic values of data and returning similar sized and sounding results. This helps you find data matches that were previously unknown, due to differences in spelling, typographical errors or other key values (e.g., the name John Smith with three different SSN's). Names may also vary from language to language or even region to region but may refer to the same person (e.g., Osama and Usama). Name Matcher can account for their minor differences, as well.

Name Matcher searches through the attributes specified and finds all unique values and their phonetic matches. You have a number of options when running a Name Matcher request.
  • Max Returned Items lets you limit the number of values returned in a particular search so that your returned results remain smaller and more usable.
  • Break on Space enforces instances of blank spaces, slashes, parenthesis, etc., to be taken into account when trying to match data. Strings of characters with these values embedded in them are broken into distinct words on the "spaces."
  • The Number of Words to Match and Minimum Word Size slider bars allow you to eliminate values based on length by dictating the minimum number of words in an attribute that must match and the minimum number of significant characters in each word, respectively.
  • The Exclude list lets you specify values that should be skipped by the Name Matcher during the search.
  • The Use Alias check box lets you decide whether or not to use your existing Alias List values during the search.
  • The last option you have is to determine the matching algorithm used:
    • Soundex works by algorithmically encoding a word based on a set of values and comparing these resulting codes.
    • Metaphone works in a similar way but focuses on how a word sounds in English.
    • String Distance lets you select how many letters in a string that can vary from word to word and is effective for locating slight spelling errors.
The applications for this service are many and include finding all typing errors or variations in matched words; finding out how many names come up with multiple SSNs or account numbers; and find out how many SSNs or account numbers come up with multiple names.

This month's Link Chart of the Month discusses medical insurance claims analysis. This is sample data that will demonstrate the results one can expect to see when using a simple model for medical insurance claims.

The model we are using is shown in Figure 1. Although this sample model is quite simple, a production model would probably include other identifiers (SSN, for instance) and might include additional objects that could serve to provide additional trends. For instance, including diagnostic and/or treatment codes would allow an analyst to detect whether or not certain treatments are preferred by fraudsters.

Our simple model includes claims and information related to each claim. Each CLAIM object is identified (keyed) by an ID generated by the source claims management system. The PHYSICIAN and PATIENT objects are identified by their names. The PATIENT_PHONE object is keyed on the phone number itself and the PATIENT_ADDRESS object uses the Street, City, State and Zip Code as a composite key.

In our model, we will be looking for patients that are sharing either phone numbers or addresses. We might expect to see this for spouses and dependents, but in general, each individual patient should have a single address and, perhaps, two or three phones (home, work and mobile).

To begin our analysis, we queried our database for a particular date range and region to make the quantity of data manageable. The result is shown in Figure 2.

As you can see, we have found a number of networks. In this Figure, each network is laid out with a PHYSICIAN object to the far left of each group, a CLAIM object in the second column followed by the PATIENT object, and finally, the PATIENT_PHONE and PATIENT_ADDRESS objects on the right.

Figure 3 (below) shows a more detailed illustration of this arrangement.

If we examine the large network diagram, the two networks in the upper-left display interesting structures. In Figure 4, we zoomed in on the six networks in the upper left.

The entire bottom row and the network in the upper-right show normal structures. In each network we have a single physician connected to claims that are, in turn, connected to patients that all have single phones and addresses. Further, none of these patients has more than a single claim. These networks are normal and show no indication of fraudulent activity.

The top-center network shows some interesting crossed lines. Closer examination of this network reveals that these crossed lines are due to patients having filed multiple claims for the period we are investigating. The number of claims is low, and all of the patients have a single phone and address. This network warrants no further investigation.

This brings us to the top-left network. This network shows a very interesting structure. We can see from this structure that some of the patients are connected to more than one phone or address object. This is indicated by the fifth and sixth columns to the right of the network. Additionally, some of the patients have submitted multiple claims for the analysis period. We need to take a closer look at this network.

In Figure 5, we have zoomed in on the network and have rearranged the objects to make a little more sense. We can clearly see four distinct network structures. Each of these structures is detailed below. In the discussion that follows, the PHYSICIAN object is moved closer to the section under discussion and the other sections of the network are hidden for clarity.

The upper section of this network, magnified in Figure 6, is actually a normal network, showing a number of patients who are connected to a single claim and just one phone and/or address. We can discount this section of the network from our analysis.

The next section of the network, magnified in Figure 7, is also indicative of normal activity and shows no indication of fraud. The distinguishing characteristic in this group is that the patients have submitted multiple claims for the period. Since the number of claims is low, we need spend no further time on this section of our network.

The section of our network shown in Figure 8 shows a number of PATIENT objects that share a single PATIENT_PHONE object. Though this could be indicative of fraud, it could also be due to data entry errors. It could also be a large family that uses the same phone number in their patient records.

However, in this case, drilling down on the objects shows that the names are not at all similar. This would suggest that further investigation is in order for this section of the network.

We come now to the final section of our network diagram. In Figure 9 we see a group of patients that share an address. The possible explanations for this condition are the same as with phone numbers: name misspellings or these are members of the same family.
In this case, when we examine this group in detail, we find that these are college-aged patients. In fact, they appear to belong to the same fraternity and live together in the same residence, thus explaining the shared address. This reminds us that due diligence is required to ascertain the true nature of the data.

This leaves us with the section of our network shown in Figure 8. What might we do to extend our analysis? Following are just a few suggestions.

Additional steps might include deleting all of the objects from the full network diagram that are not shown in Figure 8. This will clean up the diagram and allow us to focus on the objects of interest. Next, we could expand our analysis on this network by executing single-level data walks on the objects displayed. This would add additional objects to the diagram for these patients and for this physician.

After each data walk, the objects returned can to be examined to determine if they contribute to our analysis, or if they are "noise" objects. "Noise" objects should be deleted.

Additional analysis activities might also include walking these objects to other data sources (models) or we might plot the objects on a temporal matrix to look for repeating temporal patterns. It may even be necessary to review the original source documentation for the objects in question.

To unsubscribe from this Newsletter, use the unsubscribe option in your support site profile. If you do not have a profile, please e-mail support@visualanalytics.com to unsubscribe.
Digital Information Gateway Shared Components for VAI Products VisuaLinks - Best of Breed Product Visual Analytics Home Site The Linkletter Archives Visual Analytics News Digital Information Gateway Software News VisuaLinks Software News The LinkLetter Cover Home of Visual Analytics The LinkLetter Archives Home of Visual Analytics Visual Analytics Inc. Digital Information Gateway VisuaLinks - Link Analysis, Data Mining Tool Visual Analytics Home Site The LinkLetter Newsletter Archives Visual Analytics, Inc. News Digital Information Gateway Software News VisuaLinks Software News The LinkLetter Cover