Search Depth and Search Scope
Description
This page has software for calculating the depth and scope of a
firm's technical knowledge over time, as expressed in the citation
patterns in its patents. These measures were created because firms can differ in
their reuse and use of existing knowledge ("search depth"), just as they can vary
in their exploration of new knowledge ("search scope").
The depth measure describes how deeply a firm reuses its existing knowledge; the scope measure describes how widely a firm explores new knowledge
(see Katila, 2000; Katila & Ahuja, 2002 for details).
Depth is measured by counting how often
each citation in the current patents has occurred before, and scope by
counting how many of the current citations have never occurred
before. In other words, they measure how much the firm exploits
existing knowledge vs. explores new knowledge in its innovation search.
Although these measures were developed for patent data, it
is possible to use them to measure exploitation and exploration more
generally as well. For details, see
The software is distributed under GPL, and can be downloaded and
used freely for research (see "License" below).
If you download the program and find it useful,
I would appreciate you letting me know (rkatila @ stanford.edu).
Implementation
The software is implemented as a standalone ANSI-C program ("depthscope") so that
very large data sets can be processed efficiently, which would be
difficult to do in a generic tool such as Matlab. The data are read
into an internal tree representation that has a small memory footprint
and is fast to process. For example, calculating the depth and scope measures
for a set of nearly 300,000 patents with over 2 million citations
takes less than 220MB of memory and two minutes (on a Pentium 4) to
process.
The code was developed in linux. No libraries other than the
standard input/output library are used in the implementation, and the
code should therefore be easy to install and modify across
platforms. It should compile and run without modification at least on
the various unix/linux platforms, Mac OS X (under terminal), and
microsoft windows (under Cygwin).
Installing and Running
- If you are using a Mac OS X, open the terminal application; it
will give you a unix terminal window where depthscope can be run.
If you are using microsoft windows,
download and install the Cygwin linux emulator first.
Make sure you include the gcc compiler (in the "select packages"
menu, select "devel" and then "gcc: C compiler upgrade helper"). Open Cygwin, and
you will get a terminal window.
- Create a directory where you want depthscope to be installed.
Download the following files to that directory:
- In the terminal window, cd to that directory, and
compile the program with "gcc -o depthscope depthscope.c".
You will get an executable file called "depthscope".
- Run the test file with "./depthscope example-inputdata
testresults". The program should warn that two patents were
ignored (as shown in "example-erroroutput"), and create an output
file "testresults". This file should be identical to
"example-results".
Applying Depthscope to Your Own Data
You can apply depthscope to your own dataset (1) by creating a file of
historical data similar to "example-inputdata", (2) changing a few
compiler constants in depthscope.c and recompiling it, and (3) running the recompiled
program with your dataset as input.
The file format
is described in the beginning of the "example-inputdata" file. Your dataset should be
clean and in a consistent format. The program does check, however,
that each patent entry occurs only once and that the patents fall within the
sample years FIRSTYEAR-LASTYEAR (given as compiler constants near the
beginning of the program). Those patents that fail these checks will
be ignored, with a message generated in the standard output.
Data are read into an internal tree format, and the depth and scope measures are
calculated based on patent IDs and references during the previous
NPREYEARS (compiler constant) years. The output file lists the number of patents and the
calculated depth and scope measures for each year.
If the compiler constant DEBUG is set to TRUE, debugging output is generated in
the standard output. It includes (for each year) a list of all patent
IDs, all references (unique), and all patent IDs and references
(nonunique) during the previous NPREYEARS. The debug output is useful
for understanding how the calculations are done, and checking that they
are done correctly for new data. An example debug output file (for the
example-inputdata run) can be downloaded from example-debugoutput.
License
Version 2.0, Copyright (C) 1999-2005 Katila, Miikkulainen.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License
version 2 as published by the Free Software Foundation. This
program is distributed in the hope that it will be useful, but without
any warranty, without even the implied warranty of merchantability or
fitness for a particular purpose. See the GNU General Public License
for more details.