NumCalc is my personal numerical methods program where I’ve implemented some nice algorithms for numerical computation. The new in the list is Markov Chain.

The Wikipedia article (link above) is far too complex… I’ll try to give a simplified explanation:

A travelling salesman goes back and forth in a set of cities and, given the city he is currently in, you want to know what’s the next city he’ll travel. Of course, he won’t show you his travel itinerary.

The simplest way of doing it is to record all travels he does within time. For each city, you have a counter of how many times he went from each city to all other. If you think these numbers as a portion of all the travels from each city you have a probability of going to any other city in the list.

Example: When he was on Paris, he went 3 times to London, 2 times to Amsterdam and only 1 time to Milan. It means that, 3 out of 6 times (50%) he went to London, so the probability of going again is 50%.

For such small quantities it’s weird to assume that the behaviour will be always the same (he can go to new cities as well) but when the amount of statistics you have is big, the behaviour become very repetitive and thus, predictable.

Real Cases:

  • MegaHAL uses an advanced Markov model to create chat bots by replying what people said before based primarily on the sole probability of one word coming after the other.
  • HMMER is hidden Markov model (a Markov model to predict another Markov model to predict something else) that can do powerful searches within long and scrambled sequences of proteins and genes. The IntrePro group use it to find their protein matches against UniProt.

Of course my super-simplified model is far from being that efficient and useful, but it’s a good start to understand how simple and how powerful they are. You can download it from its webpage.