A* pathfinding

A* is essentially the least common denominator of planar pathfinding algorithms (unless you count Dijkstra's). It takes a graph and start and destination nodes, and returns a solution which is the set of nodes/edges comprising the shortest path from the start to the destination node. It is used on a connected (and for me, undirected) graph with non-negative edge weights to find the shortest path between two given nodes. Unlike Dijkstra's shortest path algorithm, A* prefers to try paths in the direction of the destination node before trying others. So given a visibility graph, it will likely find the optimum path quickly without evaluating other paths.

The details will not be covered in detail here. For a very nice explanation, visit Amit's page on A-Star. I will describe the basic idea though. For this, I will use the graph below.

example graph
Assume this graph for this demonstration, with given start and end nodes.

The algorithm stores two sets of nodes, an open and closed set. Ones in open are going to be evaluated, and ones in closed have been evaluated. The previous statement is not strictly true, since there are cases when nodes in closed need to be moved to open, and when nodes in open need to be re-evaluated. For the data structures, we need to check open and closed to see if a node is in it, and we also need to remove the node with lowest cost from open. So the requirements for the open list are slightly stronger, since a number must also be associated with each node in open. For my code, I use a generic list for closed, and a sorted list for open.

There are also so concepts to introduce. The notion of the cost of a node is the sum of edge weights (think of them as edge lengths if you want) that from the start node to the one in question for a given path. The start node has a cost of 0, and a neighbor that is 100 units away has a cost of 100. The cost is the cost of moving from the start to a node in question, on a particular path. The exact path depends on the history of visitation; more on that later.

The heuristic is a number that is calculated at a node being evaluated. It is equal to the cost of the node (defined above) plus a distance estimate to the destination. The estimate is the algorithm's best guess of the remaining edge lengths required to reach the destination. Most people just use straight line distance from the node in question to the destination, although there are better estimates. Look at Amit's page for details. Anyway, given this, the heuristic can be thought of as the estimated path length for a solution containing the path so far (from start to the node in question), plus whatever remains. The heuristic in the index that the open list uses to sort its nodes. A* chooses nodes with minimum heuristic values from open to evaluate next, which guarantees that paths "tending" to the destination are evaluated first.

cost and heuristic
Assuming the node evaluated is 8, the red line shows the edges considered in the cost, equal to 250. The green line is the cost plus the distance estimate, which is taken as the straight line distance, and equal to 450.

Now that the cost has been explained, and the heuristic (cost + remaining distance estimate) have been explained, we can describe the algorithm. First, the start node is added to the open list with zero heuristic (we could also just use straight line distance; it's the same since it will be shorter than or equal to any possible path from start to destination). Then the main loop begins.

In the main loop, the node with the smallest index (heurstic value) is removed from open, call it the current node. We now check each of neighbor of the current node in a sub-loop. If it's the destination, we have found the path, and I will describe how to reconstruct the path found later. If the neighbor is in closed, then if the cost to the current node plus the edge length to the neighbor (this sum is the new cost to the neighbor) less than the cost already stored with that neighbor, then we move it from closed to open with a recalculated heuristic (new cost + distance estimate). This is saying that if we visited a node already, but have found a better path to get there, we use the better one. A hypothetical example is shown below.

closed reevaluation
This illustration serves to demonstrate the re-evaluation of a closed node. Assume that we have visited node 4 using the red path already, so that node 4 is in closed. The heuristic using the red path is 150 + 100 + 140 + 100 = 490. The next iteration, we will pick the node 3 as the current node since its heuristic value is 150 + [260?] ~ 310 < 490 (we don't know the exact distance estimate). Now, it's neighbor 4 is already in closed, but since it now has a smaller heuristic this iteration (150 + 170 + 100 = 420), we will want to put 4 back on open and use the newfound green path to get to node 4 rather than the previous red one.
Note that we would not have visited node 8 from node 3, since node 4 has a better heuristic at that point. This example is not precisely faithful to how the algorithm work in this case, but you see the illustrative point.
Also note that the need to re-evaluate closed nodes happens because we have a cycle in the graph. So if you know your graph has no cycles or you don't care about finding the shortest path, you can simply choose to not re-evaluated closed (or open, if you so please) nodes.

Otherwise, if the neighbor is in open, we do a similar check on it as if it were in closed. If the new cost is less than the cost already stored in the neighbor, then we recalculate the heurstic for the neighbor in open (resort the list). This case and the previous one simply take into consideration the event of a shorter path found to a certain node that we know about, whether it is in the closed list or the open list.

If none of the previous cases were satisfied, then the neighbor is added to open with heurstic equal to the the current node's cost, plus the edge length between the nodes, plus the distance estimate from neighbor to destination. This concludes the main loop.

Each time I said we add a node to one of the lists, there is more to it than just adding it to the list. We have to also store the cost to the added node, which is calculated and set in fromcost. Then we must also store some information allowing us to reconstruct the solution path. For that, we store the edge that we took to get to the new node, which is named from. Both of these go in VL_GRAPH_NODE_PAYLOAD.

Below is the code:

A* implementation

int node_distance(_node<point2Di,int>* a, _node<point2Di,int>* b){
	int dx = a->val.x - b->val.x;
	int dy = a->val.y - b->val.y;
	return sqrt(dx*dx + dy*dy);
}

List<_edge<point2Di,int> > pathlist;

void astar(_node<point2Di,int>* start, _node<point2Di,int>* end){
	iList<_node<point2Di,int> > open;
	List<_node<point2Di,int> > closed;

	List_item<_node<point2Di,int> > *closed_temp_item;
	iList_item<_node<point2Di,int> > *open_temp_item;

	open.add(0, start);
	start->from = NULL;
	start->fromcost = 0;

	_edge<point2Di,int>* te;
	_node<point2Di,int>* tn;
	_node<point2Di,int> *cn; // current queue-node

	pathlist.clear();
	while(open.len() > 0){
		cn = open.dequeue();
//printf("At %d", cn->id); if(cn->from!=NULL)printf(" from %d", cn->from->id);printf("\n");
		closed.add(cn);

		if(cn != end){
			list_iterator<_edge<point2Di,int>* > ei = cn->edges.iterator();
			while(ei.hasNext()){
				te = ei.next();
				tn = (te->to == cn)?te->from:te->to;
//printf(" Considering %d\n", tn->id);
				int tval = cn->fromcost + te->val;
				int heur = node_distance(tn, end) + tval;

				if((open_temp_item = open.get_item(tn)) != NULL){ // if it's already in open, perhaps we found a better way to get there
//printf("  Found in open, Curnode: %d, candidate node: %d, fromcost: %d, newcost: %d\n", cn->id, tn->id, tn->fromcost, tval);
					if(tval < tn->fromcost){
//printf("  Removing and adding to open with new value %d\n", heur);
						open.remove_item(open_temp_item);
						open.add(heur, tn);
						tn->from = te;
						tn->fromcost = tval;
					}
				}
				else if((closed_temp_item = closed.get_item(tn)) != NULL){ // check to see if its value in closed is bigger than this one, if so put it in open with this path instead
//printf("  Found in closed, Curnode: %d, candidate node: %d, fromcost: %d, newcost: %d\n", cn->id, tn->id, tn->fromcost, tval);
					if(tval < tn->fromcost){
						closed.remove_item(closed_temp_item);
						open.add(heur, tn);
					}
				}else{
//printf("  Adding %d to open, with value %d\n", tn->id, heur);
					open.add(heur, tn);
					tn->from = te;
					tn->fromcost = tval;
				}
			}
		}else{ // we found our path
			tn = end;
			while(tn != NULL){
				te = tn->from;
				if(te != NULL){
					tn = (te->to == tn)?te->from:te->to;
					pathlist.prepend(te);
				}else{
					break;
				}
			}
			return;
		}
	}
	// if open is empty, there's no path
	return;
}

I have left in the debugging output because it helps to understand how the algorithm finds new nodes to evaluate. Also consider changing node_distance to only return 0, so that the algorithm reduces down to Dijkstra's shortest path algorithm.

The code largely follows what I described, but with more syntax specific to my classes thrown in. I go through each edge using a list iterator because it's more efficient that indexing each edge (O(n) vs. O(n2)). The node cn is what is described as the current node above. tn is the node being evaluated, and te is the edge between cn and tn. The tval variable is the cost to the node being evaluated (tn), and heur is the heuristic value to be assigned to tn.

You can also see that in the case that tn is already in the open list, then I actually remove it from open and add it back to open to force the node to be inserted in sorted order with the new heuristic. Writing the code to simply move the node into sorted position is too tedious at the moment.

I had also promised to describe how to extract the solution path, so here it is. We start at the end node, and since each node stores the edge that was used to get there, we use that edge to find the previous node, and the the edge used to get there, etc. The code is above, and we just keep adding the edges found to the beginning of a list of edges (global pathlist) until the start node is found where there is no previous edge. We can use pathlist by considering the start node and iterating through the list and going down each successive edge.