In this article we are going to tackle one of the problems in the graph theory department. The problem solution will be realized in two variants: **Python** and **PostgreSQL** (plain PL/pgSQL). Python version is aimed at visualizing the problem in detail and getting the grasp of the idea in a clear and accessible way. PostgreSQL solution makes use of recursive functions and recursive CTEs (*common table expressions*).

The code can be found in the git repository.

# Literature, resources

- Leonhard Euler
**Pierre-Henry Fleury**- I'd appreciate pointing me to a good resource. Some discussion can be found on stackexchange.com.- Fleury's algorithm (Wikipedia)
- Encyclopedia of Mathematics
- Lecture by David Eppstein (University of California)
- Neelam Yadav / geeksforgeeks.org

# Visualization

Although the problem (and the terms) might seem a bit overwhelming, the idea is simple: we want to visit all the vertices without ever traversing any of the edges more than once.

A very well known example: **draw an envelope without lifting a pencil**.

Thus, this article discusses the algorithm using this analogy. "An envelope with a little ribbon".

# Fleury's algorithm

Fleury's algorithm isn't quite efficient and there are other algorithms. However, only Fleury's algorithm is covered here.

This Wikipedia article (in Polish) provides a generic pseudocode for a solution using a stack data structure. The algorithm modifies the graph, therefore that article also discusses an abstract data structure that would implement a *copy constructor* allowing for a copy of the original graph.

The solution presented in this article operates directly on the original copy of the graph: the edges are being removed during the traversal. This can be easily modified, though.

**Rules**:

The article Eulerian path and circuit for undirected graph (geeksforgeeks.org) already discusses the theory in a clear and visual way. Here we're listing only the essential rules.

**Eulerian cycle** exists when:

- all non-zero degree vertices are connected (isolated nodes are not considered)
- all vertices have even degree.

**Eulerian path** exists when:

- all non-zero degree vertices are connected (isolated nodes are not considered)
- the graph has 0 or 2 odd degree vertices and other vertices have even degree.

# Python implementation

Python version is based on the excellent guide by Neelam Yadav / geeksforgeeks.org with tweaks, optimizations and added visualization.

```
class Graph:
def __init__(self, edges):
self.G = {}
[self.add_edge(*edge) for edge in edges]
self.G = {k: list(set(v)) for k, v in self.G.items()}
def add_edge(self, src, dst, idxs=None):
u_idx, v_idx = (idxs or [None, None])
self.G.setdefault(src, [])
gs = self.G[src]
gs.insert(u_idx, dst) if u_idx is not None else gs.append(dst)
self.G.setdefault(dst, [])
ds = self.G[dst]
ds.insert(v_idx, src) if v_idx is not None else ds.append(src)
def remove_edge(self, v, u):
u_idx = self.G[v].index(u)
self.G[v].pop(u_idx)
v_idx = self.G[u].index(v)
self.G[u].pop(v_idx)
return [u_idx, v_idx]
def get_odd_degree_vertices(self):
return {s: len(d) for s, d in self.G.items() if len(d) % 2 != 0}
def get_pathlen(self, src, limit=None, visited=None):
"""Recursive"""
visited = (visited or {})
visited[src] = 1
if limit is not None and len(visited) > limit:
return len(visited)
for adjacent in self.G[src]:
if adjacent not in visited:
self.get_pathlen(adjacent, limit, visited)
return len(visited)
def is_edge_valid(self, v, u):
if len(self.G[v]) == 1:
return True
vlen = self.get_pathlen(v)
[u_idx, v_idx] = self.remove_edge(v, u)
ulen = self.get_pathlen(u, vlen + 1)
self.add_edge(v, u, [u_idx, v_idx])
return vlen <= ulen
def traverse(self, v, traversed_paths):
"""Recursive"""
for u in self.G[v]:
if self.is_edge_valid(v, u):
path = [v, u]
back = [u, v]
assert path not in paths, "Does not traverse twice"
assert back not in paths, "Does not traverse twice"
paths.append(path)
self.remove_edge(v, u)
self.traverse(u, traversed_paths)
def find_eulerian_path(self):
odd_vertices = self.get_odd_degree_vertices()
n_odd = len(odd_vertices)
assert n_odd in [0, 2], "0 or 2 odd vertices"
nodes = odd_vertices if n_odd else self.G
assert nodes, "Has somewhere to start"
v = sorted(nodes.items(), key=lambda p: -p[1])[0][0]
traversed_paths = []
self.traverse(v, traversed_paths)
return traversed_paths
```

## Analysis

- start

The algorithm starts with the`find_eulerian_path()`

method. The code verifies the number of vertices having odd degree. If the graph satisfies the requirements, a maximum odd degree vertex is chosen as a start node (`v`

). If there are no vertices having odd degree, a node of highest degree is selected. - traversal

The algorithm then traverses the graph recursively (`traverse()`

). Each connected neighbor (`u`

) is considered a valid next edge if more vertices can be visited from`u`

than from`v`

. - choosing next edge

The choice of which edge to follow is based on the number of vertices reachable from`v`

(current vertex) and`u`

(any neighbor). Before calculating the degree of`u`

the currently considered edge`v-u`

is removed so that the algorithm doesn't go into a cycle. The edge is then added back again. The path length calculation (`get_pathlen`

) is optimized - it stops as soon as it finds that more nodes can be visited from`u`

than from`v`

. -
graph modification (in-place)

`remove_edge()`

- removes an edge and returns indices of the respective neighbors`add_edge()`

- adds an edge, allows for inserting the previously removed neighbors at their exact original offsets in the list

# PL/pgSQL (postgres implementation)

The solution is loosely based on the Python version.

The database has two schemas: *store* and *graph*.

```
CREATE SCHEMA IF NOT EXISTS store; -- data
CREATE SCHEMA IF NOT EXISTS graph; -- api
```

Implementation may use arrays or `hstore`

. By default a recursive variant with `hstore`

is used, so the `hstore`

extension has to be created:

```
CREATE EXTENSION IF NOT EXISTS hstore;
```

## Schema: store

The graph is stored in a single table. During graph traversal the edges will only by marked as removed.

```
CREATE TABLE IF NOT EXISTS store.graph (
edge_id SERIAL PRIMARY KEY,
src TEXT NOT NULL,
dst TEXT NOT NULL,
is_removed BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE UNIQUE INDEX graph_uidx ON store.graph(src, dst);
```

Removed nodes could also be stored in an array or an hstore.

## Graph: `add_edge()`

The graph is undirected, thus the `add_edge()`

function inserts two records: `src-dst`

and `dst-src`

. If the edge already exists, its `is_removed`

flag will be cleared. The function does not really need to return anything and can be `PERFORM`

ed. The return type is left for "debugging"/tracing purposes.

```
CREATE OR REPLACE FUNCTION graph.add_edge(_src TEXT, _dst TEXT)
RETURNS SETOF store.graph
AS $$
DECLARE
R store.graph;
BEGIN
-- RAISE NOTICE 'Adding edge: (% <-> %)', _src, _dst;
INSERT INTO store.graph(src, dst) VALUES(_src, _dst)
ON CONFLICT(src, dst) DO UPDATE SET is_removed = FALSE
RETURNING * INTO R;
RETURN NEXT R;
INSERT INTO store.graph(src, dst) VALUES(_dst, _src)
ON CONFLICT(src, dst) DO UPDATE SET is_removed = FALSE
RETURNING * INTO R;
RETURN NEXT R;
END
$$
LANGUAGE PLPGSQL;
```

## Graph: `remove_edge()`

Edge removal can be realized by setting the `is_removed`

field to `TRUE`

. The function needs to remove both edges: `src-dst`

and `dst-src`

.

```
CREATE OR REPLACE FUNCTION graph.remove_edge(_src TEXT, _dst TEXT)
RETURNS VOID
AS $$
BEGIN
-- RAISE NOTICE 'Marking edge removed: (% <-> %)', _src, _dst;
UPDATE store.graph SET is_removed = TRUE
WHERE (
src = _src AND dst = _dst
OR
src = _dst AND dst = _src
);
END
$$
LANGUAGE PLPGSQL;
```

## Data: storing the graph

At this point the `graph.add_edge()`

function can already be used to fill in the graph edges.

```
-- envelope
SELECT * FROM graph.add_edge('A', 'B');
SELECT * FROM graph.add_edge('A', 'C');
SELECT * FROM graph.add_edge('B', 'A');
SELECT * FROM graph.add_edge('B', 'C');
SELECT * FROM graph.add_edge('B', 'D');
SELECT * FROM graph.add_edge('B', 'E');
SELECT * FROM graph.add_edge('C', 'A');
SELECT * FROM graph.add_edge('C', 'B');
SELECT * FROM graph.add_edge('C', 'D');
SELECT * FROM graph.add_edge('C', 'E');
SELECT * FROM graph.add_edge('D', 'B');
SELECT * FROM graph.add_edge('D', 'C');
SELECT * FROM graph.add_edge('D', 'E');
SELECT * FROM graph.add_edge('E', 'B');
SELECT * FROM graph.add_edge('E', 'C');
SELECT * FROM graph.add_edge('E', 'D');
-- ribbon
SELECT * FROM graph.add_edge('E', 'F');
```

**Data**:

```
SELECT G.src, array_agg(G.dst ORDER BY G.dst) FROM store.graph G
WHERE NOT G.is_removed
GROUP BY G.src ORDER BY G.src;
```

```
src | array_agg
-----+-----------
A | {B,C}
B | {A,C,D,E}
C | {A,B,D,E}
D | {B,C,E}
E | {B,C,D,F}
F | {E}
(6 rows)
```

## Graph: `assert_eulerian()`

The following function simply counts the vertices that have odd degree. It raises an exception when it finds a different number of odd degree vertices than 0 or 2. The first `SELECT`

statement (CTE) returns the vertices with odd degree, the second statement counts the records returned by the CTE and checks if there are 0 or 2 of them. The result of the comparison is placed in `is_eulerian_`

boolean variable. If `is_eulerian_`

is `FALSE`

, an error is raised.

The function could return a boolean but its use here is limited to **asserting** the graph satisfies the requirements. Hence, `RETURNS VOID`

.

```
CREATE OR REPLACE FUNCTION graph.assert_eulerian()
RETURNS VOID
AS $$
DECLARE
is_eulerian_ BOOLEAN;
BEGIN
WITH stmt_get_odd AS (
SELECT COUNT(*)
FROM store.graph
WHERE NOT is_removed
GROUP BY src
HAVING array_length(array_agg(dst), 1) % 2 != 0
)
SELECT COUNT(*) = ANY(ARRAY[0, 2]) INTO is_eulerian_ FROM stmt_get_odd;
IF NOT is_eulerian_ THEN
RAISE EXCEPTION 'Not Eulerian';
END IF;
END
$$
LANGUAGE PLPGSQL;
```

### Test: Eulerian graph

The "envelope with a ribbon" satisfies the requirements: 0 or 2 vertices of odd degree.

```
SELECT G.src, array_agg(G.dst) FROM store.graph G
WHERE NOT G.is_removed
GROUP BY G.src ORDER BY G.src;
```

```
src | array_agg
-----+-----------
A | {B,C}
B | {A,C,D,E}
C | {A,B,D,E}
D | {B,C,E}
E | {B,C,D,F}
F | {E}
(6 rows)
```

**Check**:

```
SELECT graph.assert_eulerian();
```

```
assert_eulerian
-----------------
(1 row)
```

### Test: Non-Eulerian graph

When the graph does not satisfy the requirements, an exception is raised.

```
SELECT G.src, array_agg(G.dst) FROM store.graph G
WHERE NOT G.is_removed
GROUP BY G.src ORDER BY G.src;
```

```
src | array_agg
-----+-------------
A | {B,C,F}
B | {A,C,D,E,F}
C | {A,B,D,E,F}
D | {B,C,E,F}
E | {B,C,D,F}
F | {E,D,C,A,B}
(6 rows)
```

**Check**:

```
SELECT graph.assert_eulerian();
```

```
ERROR: Not Eulerian
CONTEXT: PL/pgSQL function graph.assert_eulerian() line 15 at RAISE
```

## Graph: `get_pathlen()`

, variants

This function is crucial and probably the hardest to get right. It can be realized in several ways: as a recursive CTE (no cursors, based on and array or `hstore`

) or as a recursive function (with cursors or an array). Three different implementations are discussed here.

** ARRAY based, recursive CTE variant**:

```
CREATE OR REPLACE FUNCTION graph.get_pathlen(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
pathlen_ INTEGER = 1;
BEGIN
WITH RECURSIVE my_traversal(dst, visited)
AS (
SELECT dst,
ARRAY[src]::TEXT[]
FROM store.graph
WHERE NOT is_removed AND src = _src
UNION
SELECT G.dst,
array_append(T.visited, G.src)
FROM store.graph G
JOIN my_traversal T ON G.src = T.dst
WHERE NOT G.is_removed AND NOT G.src=ANY(T.visited)
)
SELECT COUNT(DISTINCT(dst))
FROM my_traversal
INTO pathlen_;
RETURN pathlen_;
END
$$
LANGUAGE PLPGSQL;
```

The recursive traversal stores visited nodes in an array, the recursive term of the CTE excludes the visited nodes. The order of the records returned by this CTE may not be assumed, so the final result is the number of distinct elements.

Test:

```
SELECT graph.get_pathlen('E');
```

```
get_pathlen
-------------
6
```

** hstore based, recursive CTE variant**:

Hash based version is a counterpart of the Python implementation above, **but there's a catch**. Although the function does basically the same, the result for one node is different due to the nature of recursive CTE. In case of Python the algorithm iterates the already known list of adjacent vertices, while in a recursive CTE there are two `SELECT`

statements.

```
WITH RECURSIVE cte_stmt AS (
SELECT statement -- non-recursive term
UNION [ALL]
SELECT statement -- recursive term
)
SELECT * FROM cte_stmt;
```

In Python the recursive call happens already after the `if v not in visited`

condition, while in a recursive CTE the condition is a part of the recursive term and gets evaluated once the recursive term has already started its execution.

a) tracking source vertex (`v`

) in hstore will result in node `F`

being one off

b) tracking `v-u`

- same as a)

c) tracking destination vertex (`u`

) in hstore will result in node `E`

being one off

These solutions work for the "envelope with a ribbon" case, however rigorous testing should follow.

The implementation can be analyzed and traced using the versions with a dedicated type.

The default variant is described later. However, this one deserves analysis. It looks correct and it works, but without thorough testing one cannot be sure. The code repository includes the `hstore`

recursive CTEs in the `fixme_1.sql`

and `fixme_2.sql`

files.

The function in variant "a)":

```
CREATE OR REPLACE FUNCTION graph.get_pathlen_hstore(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
pathlen_ INTEGER = 1;
BEGIN
WITH RECURSIVE my_traversal(lvl, dst, visited, row_num)
AS (
SELECT 0 AS lvl,
dst,
hstore(src, '1'),
0::bigint AS row_num
FROM store.graph
WHERE NOT is_removed AND src = _src
UNION
SELECT T.lvl + 1 AS lvl,
G.dst,
visited || (G.src || '=>1')::hstore,
ROW_NUMBER() OVER (PARTITION BY lvl) AS row_num
FROM store.graph G
JOIN my_traversal T ON G.src = T.dst
WHERE NOT G.is_removed AND
NOT exist(T.visited, G.src)
)
SELECT array_length(akeys(visited), 1)
FROM my_traversal
ORDER BY lvl DESC, row_num DESC LIMIT 1
INTO pathlen_;
RETURN pathlen_;
END
$$
LANGUAGE PLPGSQL;
```

The CTE returns the following:

```
lvl | dst | visited | row_num
-----+-----+--------------------------------------------------+---------
2 | A | "B"=>"1", "D"=>"1", "E"=>"1" | 1
2 | A | "B"=>"1", "C"=>"1", "E"=>"1" | 2
-- [...]
3 | A | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 1
3 | A | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 2
-- [...]
3 | E | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 37
3 | E | "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 38
4 | A | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 1
4 | B | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 2
-- [...]
4 | E | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 21
4 | E | "A"=>"1", "B"=>"1", "C"=>"1", "D"=>"1", "E"=>"1" | 22
```

The final statement counts the keys in the hash (`visited`

) as returned in the last record returned by the recursive CTE.

Hence: `ORDER BY lvl DESC, row_num DESC LIMIT 1`

.

** hstore base, recursive function variant (the default)**:

For the sake of clarity, the implementation is split into two separate functions:

- the entry function
- the recursive function (using
`IN`

and`INOUT`

parameters, although the result might also be explicitely returned as`hstore`

type)

The entry function:

```
CREATE OR REPLACE FUNCTION graph.get_pathlen_recursive(_src TEXT)
RETURNS INTEGER
AS $$
DECLARE
visited_ hstore = hstore(_src, '1');
BEGIN
SELECT * FROM graph._get_pathlen_recursive(_src, visited_) INTO visited_;
RETURN array_length(akeys(visited_), 1);
END
$$
LANGUAGE PLPGSQL;
```

The recursive algorithm:

```
CREATE OR REPLACE FUNCTION graph._get_pathlen_recursive(
IN _src TEXT,
INOUT _visited hstore)
RETURNS hstore
AS $$
DECLARE
u_ TEXT;
BEGIN
SELECT COALESCE(_visited, '') || hstore(_src, '1') INTO _visited;
FOR u_ IN SELECT dst FROM store.graph
WHERE NOT is_removed AND src = _src
LOOP
IF NOT exist(_visited, u_) THEN
SELECT graph._get_pathlen_recursive(u_, _visited) || _visited
INTO _visited;
END IF;
END LOOP;
END
$$
LANGUAGE PLPGSQL;
```

This approach yields correct results and using `hstore`

allows for optimum performance.

**Test**:

```
SELECT graph.get_pathlen_recursive('E');
```

```
get_pathlen_recursive
-----------------------
6
(1 row)
```

```
SELECT graph.get_pathlen_recursive('F');
```

```
get_pathlen_recursive
-----------------------
6
```

## Graph: `is_edge_valid()`

This is an exact SQL counterpart of the Python implementation.

```
CREATE OR REPLACE FUNCTION graph.is_edge_valid(_src TEXT, _dst TEXT)
RETURNS BOOLEAN
AS $$
DECLARE
vn_ INTEGER = 0;
un_ INTEGER = 0;
BEGIN
SELECT count(*) FROM store.graph
WHERE NOT is_removed AND src = _src
GROUP BY src
INTO vn_;
IF vn_ = 1 THEN
RETURN TRUE;
END IF;
SELECT graph.get_pathlen_recursive(_src) INTO vn_;
PERFORM graph.remove_edge(_src, _dst);
SELECT graph.get_pathlen_recursive(_dst) INTO un_;
PERFORM graph.add_edge(_src, _dst);
RETURN vn_ <= un_;
END
$$
LANGUAGE PLPGSQL;
```

## Graph: `traverse()`

This function traverses the graph in a recursive fashion. For each adjacent vertex the edge `v-u`

is checked. If the edge was not already traversed and is valid (more nodes reachable from `u`

), the path is added to the solution, the edge `v-u`

is removed and the traversal follows the edges starting at vertex `u`

. The collected paths are returned as an array of text elements.

```
CREATE OR REPLACE FUNCTION graph.traverse(
_v TEXT,
_paths TEXT[])
RETURNS TEXT[]
AS $$
DECLARE
u_ TEXT;
is_valid_ BOOLEAN = FALSE;
edge_src_dst_ TEXT;
edge_dst_src_ TEXT;
BEGIN
FOR u_ IN SELECT dst FROM store.graph
WHERE NOT is_removed AND src = _v
LOOP
SELECT * FROM graph.is_edge_valid(_v, u_) INTO is_valid_;
IF is_valid_ THEN
edge_src_dst_ = (_v || '-' || u_);
edge_dst_src_ = (u_ || '-' || _v);
IF (edge_src_dst_ = ANY(_paths) OR edge_dst_src_ = ANY(_paths))
THEN
RETURN _paths;
END IF;
SELECT * FROM array_append(_paths, _v || '-' || u_) INTO _paths;
PERFORM graph.remove_edge(_v, u_);
SELECT * FROM graph.traverse(u_, _paths) INTO _paths;
END IF;
END LOOP;
RETURN _paths;
END
$$
LANGUAGE PLPGSQL;
```

## Graph: `find_eulerian_path()`

The algorithm starts with this function. An odd degree vertex is selected as a starting point. If such a vertex is found, the recursive traversal starts. The results are returned as a set of text records.

```
CREATE OR REPLACE FUNCTION find_eulerian_path()
RETURNS SETOF TEXT
AS $$
DECLARE
v_ TEXT;
path_ TEXT[] = ARRAY[]::TEXT[];
BEGIN
SELECT src,
array_length(array_agg(dst), 1) AS degree
FROM store.graph
WHERE NOT is_removed
GROUP BY src HAVING array_length(array_agg(dst), 1) % 2 != 0
ORDER BY degree DESC
LIMIT 1
INTO v_;
IF FOUND THEN
-- RAISE NOTICE 'Start vertex: %', v_;
SELECT * FROM graph.traverse(v_, path_) INTO path_;
END IF;
RETURN QUERY SELECT UNNEST(path_);
END
$$
LANGUAGE PLPGSQL;
```

**Test**:

```
SELECT * FROM find_eulerian_path();
```

```
NOTICE: Start vertex: D
find_eulerian_path
--------------------
D-B
B-A
A-C
C-B
B-E
E-C
C-D
D-E
E-F
(9 rows)
```

```
EXPLAIN ANALYZE SELECT * FROM find_eulerian_path();
```

```
NOTICE: Start vertex: D
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Function Scan on find_eulerian_path (cost=0.25..10.25 rows=1000 width=32) (actual time=7.406..7.407 rows=9 loops=1)
Planning Time: 0.011 ms
Execution Time: 7.413 ms
(3 rows)
```