**The Pathfinder algorithm: the original, binary, Fast and MST-variants**

**The algorithm**

A Pathfinder Network (or PFNET) is a scaled network (or graph, in the sense of the graph theory) in which weighted edges have been pruned in a specific way. Only those edges which do not violate the triangle inequality are kept. The triangle inequality states that the direct distance between a couple of nodes must be lesser than or equal to the distance between any other path linking these nodes. The corresponding algorithm, first devised by Schvaneveldt (1990) has two parameters: r, which define the Minkowski r-metric used to compute the mentioned distance, and q, which define the maximum length of the paths considered in the triangle inequality. Pathfinder Networks are very important in the field of social networks, in which it helps to exhibit a unique representation of the underlying structure of the domain. In this context, the weight of the edges often represents the similarity between the entities given by the nodes.

According to the litterature, the most used values for its parameters are r=∞, meaning that we keep the edges having a value higher that the maximum similarity given by all the other paths, and q=n-1, meaning that we consider the paths having any kind of lengths, including the maximal one, n-1.

MST-Pathfinder, up to our knowledge, is the fastest algorithm to obtain Pathfinder Networks for r=∞ and q=n-1.

**Applications**

These networks are used in a large variety of applications including:

- Co-citation analysis (Buzydlowski, 2002)
- Latent knowledge visualization (Chen et al., 2001)
- Scientific domain visualization (Chen, 1998a)
- Communication networks (Shope et al., 2004)
- Animated visualization models of toxins (Chen & Morris, 2003)
- Mental models (Kudikyala & Vaughn, 2005)
- Debugging multi-agent systems (Serrano, 2010)
- and more recently the co-firing of rules in fuzzy systems (Alonso, 2012)

Additional information on the Pathfinder Networks can be found on the corresponding Wikipedia webpage: http://en.wikipedia.org/wiki/Pathfinder_networks

The following table shows a comparison of the variants of the Pathfinder algorithm, along with their performance in terms of time and memory, took from the paper [3]. We also provide a link to the source code (written in C). Some ZIP files contain a Makefile to compile the code on Unix-like platforms, and Visual Studio 6 project files to compile the code on Windows-like platforms, but the code should compile on both platforms.

Name of the algorithm |
Application domain | Time complexity (for q=n-1) |
Space complexity |
Approach in algorithm theory |
Download | Academic paper |

Original PF | Any valid values for q and r, (un-)directed graphs |
O(q · n^{3}) = O(n^{4}) |
2 · q · n^{2} = 2 · n^{3} - 2 · n^{2} |
Dynamic programming | download link | |

Binary PF | Any valid values for q and r, (un-)directed graphs |
O(log(q) · n^{3})= O(n ^{3} · log(n)) |
4 · n^{2} |
Dynamic programming | download link | |

Fast PF | Any valid values for r, q=n-1, (un-)directed graphs |
O(n^{3}) |
2 · n^{2} |
Dynamic programming | download link | |

Fast PF | Any valid values for q and r, (un-)directed graphs |
O(n^{3}) |
2 · n^{2} |
Dynamic programming | download link | |

MST-PF (theoretical) |
r=∞, q=n-1, undirected graphs |
O(n^{2} · log(n)) |
3 · n^{2}+n |
Greedy approach | download link | |

MST-PF (practical) |
r=∞, q=n-1, undirected graphs |
O(n^{3}) |
3 · n^{2}+n |
Greedy approach | download link |

Important notices:

- The source code given above for MST-theoretical and MST-practical are exactly the same. You will have to change the value of the define called 'BASIC_FUNCTIONS' in the file global.h to select the desired behavior. With a value of 0 (the default option), we run the fastest version of the algorithm which use index-based disjoint sets to store intermediate results. Even if the theoretical time complexity is larger, it is the fastest version: see the academic paper for more details. With a value of 1, we run the slowest version of the algorithm, which use the forest-based disjoint sets having the smallest time complexity.
- In all the variants above, and because we have implemented it for a specific case, which is the study of the Visual Science Maps, or Scientograms, the weights of the edges of the graphs correspond to a distance and not a similarity (we are looking for a network having smaller distance or weight values for their edges, so we are pruning the edges having large values).
- For the same reason, we have only implemented the case r=∞ for the Original, the Binary and the Fast variants of Pathfinder, even if theoretically, other values of r can be used. r=∞, which consider that the weight of a path is equal to the maximum value of the weights of any of its edge, is also the most common scenario. This allowed us to directly hard code the computation of the maximum instead of the Minkowski metric which would have been slower. Please, check the functions Programacion_Dinamica() or sumasims() if you need to change this behavior.

Running the source code

After the compilation (just type 'make' in command line on Linux), run it this way:

<program> <filename> <q>

- <filename> indicates the filename of the Pajek-formated description of the graph.
- <q> defines the maximum length of the paths considered in the triangle inequality, and is only available for the Original and the Binary variants of Pathfinder.

Example of a graph given in the format of Pajek, by edges:

*vertices 6 1 "1" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 2 "2" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 3 "3" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 4 "4" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 5 "5" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 6 "6" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald *edges 1 2 1 2 3 1 1 6 1 6 5 2 5 4 1 4 3 1 |

Example of a graph given in the format of Pajek, by an adjacency matrix:

*vertices 4 1 "1" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 2 "2" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 3 "3" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald 4 "4" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0 ic Emerald *matrix 0 1 4 5 2 0 2 4 1 4 0 1 5 3 1 0 |

More (small) example maps are given here: download link

- The Network Workbench (Original, Fast and MST-Pathfinder, tool in Java): http://nwb.cns.iu.edu/index.html
- The Cyberinfrastructure Shell (Original, Fast and MST-Pathfinder, tool in Java): http://wiki.cns.iu.edu/display/CISHELL/Algorithms
- The GUAJE Fuzzy freeware (MST-Pathfinder, tool in Java): http://sourceforge.net/projects/guajefuzzy/

Relevant publications

[1] R. W. Schvaneveldt
(Ed.); Pathfinder
Associative Networks: Studies in Knowledge Organization; Norwood, NJ:
Ablex (1990).

RECEIVED:
16/4/2007; REVI

[2] A.
Quirin, O. Cordon, J.
Santamaria, B. Vargas-Quesada, F. Moya-Anegon; A new Variant of the
Pathfinder Algorithm to Generate Large Visual Science Maps in Cubic
Time; Information, Processing & Management Journal, 44(4):
1611-1623
(2008).

RECEIVED:
16/4/2007; REVISED:
3/9/2007; ACCEPTED:
8/9/2007;
IMPACT
FACTOR
2007: 1.500.; CATEGORY: COMP.
SCI., INF. SYST; ORDER:
27/92; DOI:10.1016/j.ipm.2007.09.005

[3] A. Quirin, O. Cordon, V. P. Guerrero-Bote, B. Vargas-Quesada, F. Moya-Anegon; A Quick MST-based Algorithm to Obtain Pathfinder Networks; Journal of the American Society for Information Science and Technology, 59(12): 1912-1924 (2008).

SUBMITTED: 12/9/2007; REVISED: 21/2/2008; NOTIFICATION OF ACCEPTANCE: 14/4/2008; PUBLISHED: 8/7/2008; IMPACT FACTOR 2007: 1.436; CATEGORY: COMP. SCI., INF. SYST; ORDER: 29/92; DOI:10.1002/asi.20904

[4] E.
Serrano, A. Quirin, J. Botia, O. Cordon; Debugging Complex Software
Systems by Means of Pathfinder Networks; Information Sciences, 180(5):
561-583 (2010).

NOTIFICATION
OF
ACCEPTANCE: 3/11/2009; IMPACT
FACTOR 2009: 3.291; DOI: 10.1016/j.ins.2009.11.007