Skip navigation

STANFORD UNIVERSITY

INFORMATION TECHNOLOGY SERVICES

System Integration Strategic Vision

Written by Minh Nguye

This document covers the strategic vision for a system integration infrastructure at Stanford that encompasses system-to-system interaction between administrative systems and brokered integration via the registry to other systems on campus. While a successful integration strategy must take into consideration business processes and policies, this document focuses on the technical aspects of system integration. In addition, this document does not cover data aggregation for the purpose of generating reports, i.e. data warehousing.

Principles

The goal of system integration is to make collaboration (through data sharing) between separate systems seamless and without coupling them together in such a way that makes them unreliable, both in terms of development and operation. It must also be broadly usable across campus, by ERP systems all the way down to departments with a need for the data, but little technical sophistication.

System integration at Stanford can be categorized into the following three categories:

  • File transfer - File transfers are the most common and easiest means to share data between two systems. Data is extracted from the source system and dropped off at some agreed location for processing by the target system. It is generally used for infrequent batch updates (e.g. once a day) because it takes time to generate and process files (which tends to be large). However, if the requirement is to produce files more frequently, you have problems of managing all the files that get produced, ensuring they are all processed and none gets lost. Once you move to fine-grained files, you should utilize message-based integration.

    Example: PeopleSoft HR to Oracle Financials interface

  • Message based - Message-based integration is used for near real-time data exchange. The source application generates a message whenever an event occurs that other applications may be interested in. A message broker is used to provide reliable routing of messages which is independent of the applications themselves. This architecture does not require that both the source and target systems to be available and ready at the same time.

    Example: Registry event service

  • Shared data store - A shared data store is a central data repository for multiple applications to use. Applications retrieve the shared data in real time from the data store rather than storing the data in their local databases. To minimize the ripple effect through every applications that uses the shared repository, the data store should use a well-understood, widely-adopted, standards-based schema that is less likely to require regular changes.

    Example: Directory service

A system integration infrastructure should provide the following:

  • An authoritative, easy-to-access reference that lays out what data lies in which systems, what that data means, what the source system is, how one can access it, who owns that data, etc. This is currently not well or broadly understood, and there is no central site, expert or reference that can convey that.

  • A method for tracking and controlling what data is being used by which systems. Currently this is not well managed, and at any given time, it is impossible to say who is using what data and for what purpose.

  • Monitoring of feeds and integration mechanisms. We need to be able to track the status and 'health' of each link in the chain of data feeds, to understand if data is not being delivered where it is being stopped, and to get this data in real-time, to allow expeditious resolution of the issue before it causes major problems in downstream systems.

  • A small set of technologies that systems can use to achieve their integration requirements.

Technologies

Stable core technologies:

  • File tranfer:
    • Timestamp attribute/field to detect changes in the source system.
    • Text delimited file format.
    • SQR, PL/SQL, and Perl to extract and process batch files.
    • SFTP, FTP, SCP, AFS for file transfer.
    • SCP-based flat file transfer service.
  • Message based:
    • Registry event service and IBM Websphere MQ (aka MQ Series) messaging brokers.
    • Event posting to notify changes with XML document service as the interface to the data.
      • Registry person, account, course data.
      • PeopleSoft person and course data.
      • Campus card person data.
      • Oracle's PTA data.
    • Event harvesting and OpenLDAP to retrieve the data.
      • Leland systems account provisioning service.
      • Active directory harvester.
    • Proxy document service for Hospital to post person XML document to the Registry. Perl to generate XML document.
    • XML message from PeopleSoft via Websphere MQ to Workflow System. Java to generate XML message.
    • XSLT to transform XML data from Registry person and account document service: Slog, Pinnacle, Campus Card, Amcom, Libraries.
    • PeopleSoft integration broker for data sharing between the PeopleSoft system and the PeopleSoft portal.
  • Shared data store:
    • OpenLDAP and Active Directory services. Perl and Java LDAP clients.
    • Registry's course document service: Coursework does a mass pull of course data four times per day.
  • Metadata repository which contains information about the data into and out of the Registry.

Emerging technologies:

  • Web services - Web services promises to be a platform independent way to facilitate communications between disparate systems. This technology is evolving rapidly and applications will increasingly use XML and SOAP as a common means of exchanging application data. Shortly before his departure, Roland Schemers began working on the development of a web services framework for Stanford that integrates with our authentication infrastructure. We need to have someone complete this work.

  • Oracle Interconnect - R&DE is piloting Oracle's Interconnect integration broker to handle integration between its systems. R&DE would like to use this technology to connect in real-time to other systems on campus such as Oracle Financials, PeopleSoft and the Person Registry.

Deprecated technologies:

  • SLAC person flat-file transfer to the Registry.

Other technologies in use:

  • Database link - A database link is a pointer that defines a one-way communication path from an Oracle database server to another database server. The database link allows local users to access data on a remote database as if the views/tables physically resides in the local database.

  • Remote database access - A small number of school/departmental applications that need data from the Student, HR, or Financials system have query access to the database servers for these systems.

    Both of the above integration technologies are being discouraged for new integration requests. The few remaining will be migrated to use other kinds of feed, most likely flat file transfer.

Projects

First:

  • Complete the SLAC migration to a document service architecture for exchanging person data with the Registry.

  • Replace the registry event service with a lightweight message-oriented middleware, or refactor the event service to be reliable and scalable.

  • Expand on the Registry's metadata repository to include the data elements that goes into and out of each system, what is the source system for each element, etc.

Next:

  • Complete Roland Schemers work on the Stanford web services framework which provides support for GSS-API socket as an alternative to SSL.

  • Change the Registry document service to support a SOAP interface with query capability.

Later:

  • Establish a RDBMS shared data store (possibly an extension to the EDW).

  • Implement a mechanism to centrally monitor all file transfers, both the creation and processing of files.

Research

  • Determine which web services framework (WSF) standards, e.g. WS-Reliability, WS-BPEL, are mature enough and applicable to Stanford to be adopted for system integration.

  • Look into alternatives for a change data mechanism for reliably detecting changes in source systems in a non-intrusive manner. Currently, source systems send flat files that contain the entire set of data in the database because there isn't a good way to capture data that has changed since the last time the file was created.

  • Determine if a system that coordinates the data flow in a sequential manner or based on conditional logic could alleviate some of our integration problems due to timing differences in the processing by endpoints. We lack the means to integrate systems in a cohesive coordinated manner. This would enable us to begin thinking about integration at the business process level rather than at the data level.

  • Determine how Stanford's integration architecture can be aligned with Oracle and PeopleSoft's technical direction and products for system integration.

Last modified Wednesday, 27-Jun-2007 03:23:10 PM

Stanford University Home Page