The Pathfinder algorithm: the original, binary, Fast and MST-variants

The algorithm

A Pathfinder Network (or PFNET) is a scaled network (or graph, in the sense of the graph theory) in which weighted edges have been pruned in a specific way. Only those edges which do not violate the triangle inequality are kept. The triangle inequality states that the direct distance between a couple of nodes must be lesser than or equal to the distance between any other path linking these nodes. The corresponding algorithm, first devised by Schvaneveldt (1990) has two parameters: r, which define the Minkowski r-metric used to compute the mentioned distance, and q, which define the maximum length of the paths considered in the triangle inequality. Pathfinder Networks are very important in the field of social networks, in which it helps to exhibit a unique representation of the underlying structure of the domain. In this context, the weight of the edges often represents the similarity between the entities given by the nodes.

According to the litterature, the most used values for its parameters are r=∞, meaning that we keep the edges having a value higher that the maximum similarity given by all the other paths, and q=n-1, meaning that we consider the paths having any kind of lengths, including the maximal one, n-1.

MST-Pathfinder, up to our knowledge, is the fastest algorithm to obtain Pathfinder Networks for r=∞ and q=n-1.

Applications

These networks are used in a large variety of applications including:


More resources

Additional information on the Pathfinder Networks can be found on the corresponding Wikipedia webpage: http://en.wikipedia.org/wiki/Pathfinder_networks

The following table shows a comparison of the variants of the Pathfinder algorithm, along with their performance in terms of time and memory, took from the paper [3]. We also provide a link to the source code (written in C).  Some ZIP files contain a Makefile to compile the code on Unix-like platforms, and Visual Studio 6 project files to compile the code on Windows-like platforms, but the code should compile on both platforms.


Name of the
algorithm
Application domain Time complexity
(for q=n-1)
Space
complexity
Approach in
algorithm theory
DownloadAcademic paper
Original PF Any valid values for q and r,
(un-)directed graphs
O(q · n3) = O(n4) 2 · q · n2 = 2 · n3 - 2 · n2 Dynamic programming download link http://interlinkinc.net/PFBook.zip
Binary PF Any valid values for q and r,
(un-)directed graphs
O(log(q) · n3)
= O(
n3 · log(n))
4 · n2 Dynamic programming download link
Fast PF Any valid values for r,
q=n-1, (un-)directed graphs
O(n3) 2 · n2 Dynamic programming download link
Fast PF Any valid values for q and r,
(un-)directed graphs
O(n3) 2 · n2 Dynamic programming download link
MST-PF
(theoretical)
r=∞q=n-1,
undirected graphs
O(n2 · log(n)) 3 · n2+n Greedy approach download link
MST-PF
(practical)
r=∞q=n-1,
undirected graphs
O(n3) 3 · n2+n Greedy approach download link

Important notices:


Running the source code


After the compilation (just type 'make' in command line on Linux), run it this way:

<program> <filename> <q>
Both 'edges' and 'matrix' formats are supported. Only matter the edge weights, not the node weights or descriptions (this one is used by Pajek to print the graphs). Check the description of the format here. Actually the 'matrix' format is not described on this page so, please, find two examples here:

Example of a graph given in the format of Pajek, by edges:

*vertices 6
1 "1" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
2 "2" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
3 "3" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
4 "4" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
5 "5" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
6 "6" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
*edges
1 2 1
2 3 1
1 6 1
6 5 2
5 4 1
4 3 1

Example of a graph given in the format of Pajek, by an adjacency matrix:

*vertices 4
1 "1" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
2 "2" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
3 "3" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
4 "4" ellipse x_fact 1.22866290766363 y_fact 1.22866290766363 fos 3.5 bw 0.0  ic Emerald
*matrix
0 1 4 5
2 0 2 4
1 4 0 1
5 3 1 0

More (small) example maps are given here: download link


Other known implementations


Relevant publications

[1] R. W. Schvaneveldt (Ed.); Pathfinder Associative Networks: Studies in Knowledge Organization; Norwood, NJ: Ablex (1990).  http://interlinkinc.net/PFBook.zip
RECEIVED: 16/4/2007; REVI

[2] A. Quirin, O. Cordon, J. Santamaria, B. Vargas-Quesada, F. Moya-Anegon; A new Variant of the Pathfinder Algorithm to Generate Large Visual Science Maps in Cubic Time; Information, Processing & Management Journal, 44(4): 1611-1623 (2008). 
RECEIVED: 16/4/2007; REVISED: 3/9/2007; ACCEPTED: 8/9/2007; IMPACT FACTOR 2007: 1.500.; CATEGORY: COMP. SCI., INF. SYST; ORDER: 27/92; DOI:10.1016/j.ipm.2007.09.005


[3] A. Quirin, O. Cordon, V. P. Guerrero-Bote, B. Vargas-Quesada, F. Moya-Anegon; A Quick MST-based Algorithm to Obtain Pathfinder Networks; Journal of the American Society for Information Science and Technology, 59(12): 1912-1924 (2008).

SUBMITTED: 12/9/2007; REVISED: 21/2/2008; NOTIFICATION OF ACCEPTANCE: 14/4/2008; PUBLISHED: 8/7/2008; IMPACT FACTOR 2007: 1.436; CATEGORY: COMP. SCI., INF. SYST; ORDER: 29/92; DOI:10.1002/asi.20904

[4] E. Serrano, A. Quirin, J. Botia, O. Cordon; Debugging Complex Software Systems by Means of Pathfinder Networks; Information Sciences, 180(5): 561-583 (2010).
NOTIFICATION OF ACCEPTANCE: 3/11/2009; IMPACT FACTOR 2009: 3.291; DOI: 10.1016/j.ins.2009.11.007