Advanced-Topics

Algorithm

Topological Sort

Dijkstra’s Algorithm

  • Dijkstra’s algorithm is an algorithm for finding the shortest paths between nodes in a graph,
  • Dijkstra is a special case for A* (when the heuristics is zero)
  • It has one cost function, which is real cost value from source to each node: f(x)=g(x). It finds the shortest path from source to every other node by considering only real cost.

A* search:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def astar(forest, sr, sc, tr, tc):
R, C = len(forest), len(forest[0])
heap = [(0, 0, sr, sc)]
cost = {(sr, sc): 0}
while heap:
f, g, r, c = heapq.heappop(heap)
if r == tr and c == tc: return g
for nr, nc in ((r-1,c), (r+1,c), (r,c-1), (r,c+1)):
if 0 <= nr < R and 0 <= nc < C and forest[nr][nc]:
ncost = g + 1 + abs(nr - tr) + abs(nc - tc)
if ncost < cost.get((nr, nc), 9999):
cost[nr, nc] = ncost
heapq.heappush(heap, (ncost, g+1, nr, nc))
return -1
  • Time: O(E + V * log(V))
    • For dense graph where E ~ V^2

      Python + heapq implementation

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      # https://gist.github.com/kachayev/5990802
      from collections import defaultdict
      from heapq import *
      def dijkstra(edges, f, t):
      g = defaultdict(list)
      for l,r,c in edges:
      g[l].append((c,r))
      q, seen = [(0,f,[])], set()
      while q:
      (cost,v1,path) = heappop(q)
      if v1 not in seen:
      seen.add(v1)
      path = path + [v1]
      if v1 == t: return (cost, path)
      for c2, v2 in g.get(v1, ()):
      if v2 not in seen:
      heappush(q, (cost+c2, v2, path))
      return float("inf")
      if __name__ == "__main__":
      edges = [
      ("A", "B", 7),
      ("A", "D", 5),
      ("B", "C", 8),
      ("B", "D", 9),
      ("B", "E", 7),
      ("C", "E", 5),
      ("D", "E", 15),
      ("D", "F", 6),
      ("E", "F", 8),
      ("E", "G", 9),
      ("F", "G", 11)
      ]
      print "A -> E:"
      print dijkstra(edges, "A", "E")
      print "F -> G:"
      print dijkstra(edges, "F", "G")

exercises

https://leetcode.com/problems/cheapest-flights-within-k-stops/description/

A* is an informed search algorithm, or a best-first search, meaning that it solves problems by searching among all possible paths to the solution (goal) for the one that incurs the smallest cost (least distance travelled, shortest time, etc.), and among these paths it first considers the ones that appear to lead most quickly to the solution

  • It has two cost function.

    • g(x): same as Dijkstra. The real cost to reach a node x.
    • h(x): approximate cost from node x to goal node. It is a heuristic function. This heuristic function should never overestimate the cost. That means, the real cost to reach goal node from node x should be greater than or equal h(x). It is called admissible heuristic.
    • The total cost of each node is calculated by f(x)=g(x)+h(x)
  • References: https://stackoverflow.com/questions/1332466/how-does-dijkstras-algorithm-and-a-star-compare

Eulerian path

e.g.
https://leetcode.com/problems/reconstruct-itinerary/discuss/78768/Short-Ruby-Python-Java-C++

Hamiltonian path

a Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a Hamiltonian path that is a cycle

AVL Tree

Red Black Tree

Binary Index Tree

Segement Tree

Design

KMP

关于kmp算法,讲的最好的当属阮一峰的字符串匹配的KMP算法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#KMP
def kmp_match(s, p):
m = len(s); n = len(p)
cur = 0#起始指针cur
table = partial_table(p)
while cur<=m-n:
for i in range(n):
if s[i+cur]!=p[i]:
cur += max(i - table[i-1], 1)#有了部分匹配表,我们不只是单纯的1位1位往右移,可以一次移动多位
break
else:
return True
return False
#部分匹配表 (The Partial Match Table)
def partial_table(p):
'''''partial_table("ABCDABD") -> [0, 0, 0, 0, 1, 2, 0]'''
prefix = set()
postfix = set()
res = [0]
for i in range(1,len(p)):
prefix.add(p[:i])
postfix = {p[j:i+1] for j in range(1,i+1)}
res.append(max(map(len,prefix & postfix)))
return res
print naive_match("BBC ABCDAB ABCDABCDABDE", "ABCDABD")
print partial_table("ABCDABD")

Hash collision

Bloomfilter

LSH

scalaibility

object-oriented design

multi-thread/multi-processs

lock / dead-lock

async

Hyperloglog