Recently I’ve been refreshing myself on Python and I got it in mind to make a little command-line utility. But what should I make? Surely whatever I could think of has probably already been done at least a few times, but that’s not the point — the point is the practice. Since I’m interested in scraping data from websites sources, a conceptually easy problem was to parse imgur albums and images for the actual images and then download them.
I present my script, imgurdl, a quick and easy imgur downloader. Aside from the obvious, the downloaded images are saved so that albums get their own folder that is named after the download link (e.g., the ‘abcd’ folder from imgur.com/a/abcd), while individual images are saved together in the output folder.
Another design feature I wanted was to make the command-line options be as flexible as possible. imgur uses different URL formats for albums and images, and the command line needed a way to deal with these. It accepts a whole list of URLs, or you can specify which are albums and which are images by using the –albums and –images arguments. But typing out URLs is painful and difficult, so to simply matters, all you need is the image code or album code. But typing the –albums and –images arguments are still annoying, so you can just omit these, and enter the codes, where only the album codes need to be prefixed with ‘/a/’.
$ python3 imgurdl.py –albums abcd 1234 –images efgh asfa -o /download/folder
This downloads two albums and two images to a /download/folder
$ python3 imgurdl.py /a/abcd
Download a single album.
There you have it, a quick and flexible command-line utility to download imgur.
This is the last post in an introductory series about Markov chains, Bayes networks and now Hidden Markov models. Hidden Markov models, or HMM, work pretty much like Markov chains, a system moves from state to state with finite probabilities, and each state produces a possible outcome. HMM are different because they model a hidden layer of states in the Markov chain responsible for the outcomes, and only these outcomes are observable. The Markov chain observations are identical to the (observable) states, while HMM has hidden (unobservable) states and observable outcomes.
Let’s consider the weather for a concrete example. For simplicity, the weather conditions are either sunny, cloudy, or raining. These conditions are observable, and will serve as the observation sequence. However, many factors influence the weather, so hidden states will represent possible causes for the different weather. Specifically, I will only consider high and low pressure to affect the weather. The graphic on the right represents the HMM we will consider. In this model, high and low pressure are hidden states (we don’t own a barometer), but we can observe the weather. There is a 0.7 chance of starting in a high pressure state. Red arrows correspond to high pressure state transition probabilities, and blue arrows correspond to low pressure. For example, in the high pressure hidden state, the output probabilities are 0.6 for sun, 0.3 for rain and 0.1 for clouds, and a low pressure state transition probability of 0.7.
In my earlier post about Markov models, I introduced the simple Markov model called a Markov chain. Before I move on to discuss Hidden Markov models, I want to diverge a bit to first introduce Bayesian networks. This will help us
Bayesian networks (or Bayes networks, Bayes nets) are a probabilistic graphical model representing random variables linked by their conditional dependencies. For example, if you stepped outdoors and saw the ground was wet and the sky was cloudy, you would infer that it has recently rained, and may soon rain again. The water on the ground is conditionally dependent on the sky being cloudy and that it has rained recently, but if the sky was sunny and clear, the ground might be wet for another reason. Generally speaking, Bayes nets are a tool to show causality between events and update our beliefs about events given some information about the events.
Bayesian networks and Markov models are similar in that they are useful graphical representations of nodes with an implied directed order. They differ in that Markov chains represent temporal relations between events (nodes), and Bayesian networks represent causal relations.
The future is now, or more formally, the future depends only on the present, and not the past. Markov models and Hidden Markov Models are canonical models for the analysis of temporal or sequential data. Andrei Markov, a Russian mathematician, known primarily for his work on stochastic processes started their development in the 1880’s. These models are the cornerstone for models in many disciplines from machine learning, artificial intelligence, chemical reactions, particle physics, weather prediction, financial forecasting and even biological processes.
The Markov model family is powerful precisely because of their flexibility and simplicity. Markov’s key insight made these analyses broadly applicable: in the present moment, the future is independent of the past. In other words, the future is best predicted by the information known now, and older information produces less accurate predictions. This conditional independence is deceptively simple, because we experience the future as dependent on the past. However, if we view time as a chain of events, such that one moment is only dependent on the previous moment, and independent of the next (after all, it hasn’t occurred yet), we have the basis for the simplest Markov model, called a Markov chain.
Today I came a across a wonderful 5-minute video explaining in very simple terms how computers handle modern encryption. The type of encryption being described here is called public key encryption.
This coincidentally comes as I’ve just finished reading Neal Stephenson’s Cryptonomicon. Briefly, the novel intertwines two time periods, World War II-era and the present (ca. 2001). At the height of the code-breaking efforts in WWII, Laurence Waterhouse is a young, mathematical genius assigned to Detachment 2702, works alongside famed mathematician Dr. Alan Turing and marine Bobby Shaftoe. Their mission is to prevent Nazi Germany from discovering that the Allies have cracked the Enigma machine.