Report on the SIENA/SoNIA work session
ICS program, Groningen, week of April 4-8, 2005
prepared by Skye Bender-deMoll (programmer and co-author of SoNIA)
Briefly, the StoCNET / SIENA software is a package for estimating statistical models of network dynamics and simulating plausible sequences of changes in a network over time. SIENA is written in Delphi and runs on Windows machines. SoNIA is a java package for visualizing and browsing dynamic networks in as "realistic" a way as possible, to facilitate understanding and exploration of networks which change over time. Earlier conversations suggested that there might be a very useful complimentary relationship between the programs. The visualization component of SoNIA could be used as a tool to display the output generated by SIENA to help in model checking. At the same time, SIENA's simulation processes can provide a technique for creating a series of intermediate events (helpful for creating stable animations) when only panel data (slice matrices) are available for a network. After conversations at the sunbelt conference, Tom Snijders (original author of SIENA) invited me to visit Groningen to explore and test some of the possibilities for linking the programs. Many thanks to Tom Snijders, Christian Steglich and Michael Schweinberger for their ideas and assistance, as well as the ICS program and ProGAMMA for providing the support to make this visit possible.
I would like to mention briefly that SoNIA is not the only possibility for visualizing networks in time - although it may be currently the most advanced. Another very useful possibility to note is the "sna" package (by Carter Butts) available for R. It has a function "gplot()" which takes an input matrix and generates a (static) image for the network. This can be very useful to quickly generate an image, and may have more powerful functions in the future. The HP Research labs has recently released a software package named GUESS which is built on the JUNG java graph libraries, and has the ability to generate movies of networks. However, it does not yet seem to have the ability to work with time to generate an animation as smoothly as SoNIA. My current employer, ATA S.p.A., is also developing a fairly advanced network data-mining software (DyNet), which is likely to have academic versions available in the future, particularly of the visualization component "GraphVista."
Some Points of Discussion
Some initial conversation was necessary to coordinate our understandings of the conceptual models of time used by the two software packages. SoNIA works best when input data are "events" with a given starting time and duration which indicate the existence of a node or arc and specify its attributes (weight, color, etc) for that time period. These intervals are then "sliced" to create graphs for visualization at various points in time. The simulation stage of SIENA generates a sequence of discrete "state changes" for tie values (0,1) and for discrete valued behavior variables, which approximate/interpolate the the network transition process between two (or more) graph matrices. In most situations, exact timings of the transitions are integrated out to simplify the calculation. To translate this into a form readable by SoNIA, it is necessary to count whenever a tie changes state to "on" and then record the end time for the interval when it changes to "off" or the simulation ends. (A similar processes will be necessary for time varying attributes) .
We realized that it was useful to add an additional period of some duration (larger than the slice size to be used in SoNIA) to the ending times of nodes and arcs which are present at the "end" of time simulation. This makes it so they will not immediately disappear (as indicated by their exit time) at the last frame of the animation. attributes and events
In some of the preliminary tests we noticed interesting features where large numbers of tie changes occurred in short periods of time, seeming to indicate that "cascades" of events were triggered by a single initial change. It was later pointed out that these cascades were in fact an artifact because we were using the wrong table for generating the times of tie changes. (it was not the expected waiting times). However, it raises an interesting question: if we were using the "real" waiting times, would such cascades (sudden bursts of change touched off by a single event) be observed in the data?
We discussed many possibilities for which simulation variables or covariates it would be interesting to include in the network to be visualized. I will only mention the ones I recall the best:
- Entropy - using the estimation of a node's (and arcs?) entropy (how well determined/specified its decisions are) as size or color.
- Happiness - using the nodes' satisfaction score to indicate size or color (or border color) i.e. Do unhappy nodes turn red before making their choice? Both 1 and 2 might require writing the states of all nodes to the file each time a change takes place in the network (to allow them to update).
- Coding ties by which term of the utility function was "dominant" in causing tie creation. ex. ties which are "created" by reciprocity are blue, those by transitivity are red...
- Interpolating networks by sampling probability distributions. We have mostly been consider the "network" as the result of a single run of the simulation. But another possibility would be to include the data from multiple runs in a single file. When SoNIA aggregates the slices with a sum operation, this would create a network in which the tie weights corresponded roughly to the probability of a tie occurring (across multiple runs). (this may just create a very dense network.. Or is it possible to directly estimate the probability function for each tie at a specified interval?
- Including multiple runs in a single file to visualize "overlap" in generation sequence. Same as 4, but use a different color for the ties belonging to each run, to get a picture of how closely the runs match.
Modifications to the SoNIA software:
After viewing the preliminary outputs from the SIENA estimation procedure, it became clear that a method to visually emphasize the changing arcs in the network would be very helpful. After some discussion, we added an option to SoNIA to "flash" newly added events. (This option was added to the graphic settings panel.) Because different networks can have very different time scales, an option to specify the duration of the flash was added as well.
Various possibilities were discussed for linking SoNIA to SIENA in a way which could make the process of constructing an animation of SIENA model output as simple as possible. One option would be to modify the StocNET program to transfer control (and an output file) to SoNIA after model generation. To facilitate future possibilities like this, SoNIA was modified so that can accept a file name (including a full path) as a parameter when it is launched from the command line. The specified file will be automatically loaded on startup, which is also helpful for speeding up the process of testing SoNIA. It should now be possible to easily launch SoNIA from another program or batch file, even if the process of constructing an animation is far from automated
One potential difficulty for people who would like to use SoNIA in combination with SIENA is that sonia requires the instillation of some additional packages which are not included by default in the Windows operating system. Most basically, SoNIA requires that Java be installed. SoNIA also relies on the Colt java numerics package for scientific computing, which must be downloaded, unpacked, and placed in the same directory. During this work session SoNIA was modified to give a more informative warning if the colt package is missing, and hopefully it will soon be possible to bundle the colt package as part of the download. Ideally, it may be possible to remove the dependency on the QTJava (Quicktime) package by coding the movie output differently, or figuring out how to generate Flash (.swf) vector animation files., but this is a more long term project.
We realized it would be useful to be able to include multiple variables in the input file which SIENA generates for SoNIA. If it is only possible to specify which covariates will be mapped to the graph attributes before running the simulation, then it would not be possible to examine multiple possibilities for the results of a single run. By saving multiple variables to the file. (for example "entropy", "happiness", etc") we give the user the option to try out various schemes for depicting the network (using entropy as node size, etc). To make this possible SoNIA was changed so that if additional columns are included in the input data (column names which are not recognized as the usual graph attributes) it will present a dialog which gives the user the ability so specify which attribute the extra columns should be used for before proceeding with parsing. However, it does not yet have the ability to transform real-valued variables into categorical variables (color names, shapes, etc) so the variables must be put into the appropriate form when the input file is generated.
Modifications to the SIENA Software
Design of an input "steering file" for SIENA to specify which variables should be included in SoNIA input when it is generated, how numeric variables should be mapped to categorical attributes like color, and which co-variates should control which graph display attributes. [I have the draft version, but we should get the "final" specification from Christian] The idea is that if this file is present in the directory of inputs for SIENA, it will generate a SoNIA (and possibly in the future, a Pajek) input file at the end of the simulation run.
Changes of the simulation code to permit it to process the recorded sequence of arc transitions and node entrance/exits and write them to the SoNIA input file (.son) format. Currently this works only for the particular estimation procedure which already had a data structure which recorded tie changes. In the future, this should be extended to the other procedures. Also, the sequence is only generated for one time interval, and should be extended for multiple wave networks. At the moment, SIENA can only generate one entrance and exit for each node.
Changes to the simulation code to write static actor covariates to the SoNIA input file as graph attributes such as color, size, shape, border color, border size. The dynamic covariates ("behaviors") and the mapping of variables to arc colors is not yet implemented.
- It may be very interesting to actually calculate the waiting times for arc/attribute changes in the SIENA estimation procedure. Do different models have varying implications about when tie changes will occur? Would the distribution in time be an indication of model fit?
- Including dynamic co-variates (behaviors) as graphical attributes in the out file from SIENA would be a powerful extension. We would then have the ability to observe contagion efects, segregation, etc, as the network structure unfolds.
- Modifications to SoNIA to deal with aggregation of attributes within slices in a more sophisticated fashion. Currently, if a bin contains multiple attribute definitions for a single node or tie, they are simply both drawn. This is not very elegant, and could be confusing. It would be good to implement a more sophisticated way of aggragating numeric variables, or choosing between catagorical variables, when these conflicts occur.
- A related issue may arrise if SIENA is generating event records for all attribute changes in a file. Currently, the SoNIA input file requires that if an attribute is specified (by including a column name) all records (events) must include a value for the attribute. If several covariates are changing simultaneously, and each change requires a line to be recorded in file, this could cause some problems in aggregation for "thick" slices. A possible solution would be to change the SoNIA input file definition to allow null values in input columns so that a record can specify only the necessary changes. This should be a simple change to implement.
- Add feature in SoNIA to flash arcs/nodes immediately before they disappear.
- Including labels in data from SIENA (SPSS id numbers, etc) to make it possible to associate the nodes in an animation with the original data.
- There are many minor/and major improvements useability and research improvements to be made in SoNIA, such as implementing, developing and testing additional algorithms and optimization techniques, stability testing of layouts, variable assignments, etc.
SoNIA Settings for Visualizing a SIENA Network
Additional instructions and definition of the input file formats are included on the SoNIA website: http://sonia.stanford.edu/
Please note that these instructions use as input the files generated by a special unreleased development version of SIENA, it is not possible to load SIENA's normal output files directly into SoNIA.
Very rough instructions for animating the result of a SIENA simulated network:
1) Load the network. Click on the jar file to launch the program, and then click "Load Network". Or launch from the command line (from the directory containing sonia) with
java -jar sonia_1_1_SIENA.jar file:"path/filename"
If there are additional variables included as extra columns in the file, it will give dialogs to select which graph attributes to use them for. The log window should show the comments from the input file. (generation parameters, number of events, etc.)
2) Create a layout. Click the "Create Layout" button. For the "start time" enter 0. For the "end time" enter 1.0. (assuming that the SIENA network has time range from zero to 1) There are many possibilities for slice duration and delta, but 0.05 and 0.05 may give reasonable results. Use the rest as defult settings (Multi-component KK)
3) Apply the layout to the first slice. Click the "Apply Layout" button. Check "randomize" (for the initial positions). The "optimum distance" paramter controls how large the network will be scaled. A value of 30.0 should give a good result for the default window size. Click "Apply" (this will apply only to the current slice)
4) Apply to the remaining slices. After the first layout finishes, click "Apply Layout" again. This time check "from previous slice" instead of "randomize" (this will chain the layouts by starting each from the solution of the previous. Click "Apply to Remaining" to apply to the current and all subsequent slices. When the layouts have finished, hit return in the "Layout (slice)" field of the layout window to return to the first (0) layout.
5) Set it to hilite new nodes. Click the "View Options" button, check "Flash new events" and set the value (the duration of the flash, to 0.01.
6) Play the movie. Click the ">" button.