A High Schooler's Guide to Independent Component Analysis
October 29, 2023
You are hosting a grand birthday party at your house. Its your special day, and you have a sneaky plan in mind--eavesdropping on your friends' juicy gossips. You hide many microphones recorders all over the house. As your birthday unfolds, everything goes precisely as you had envisioned. But the next day, when you excitedly play back the recordings, you are met with a shocking revelation: all the voices are intertwined, its nearly impossible to decipher what anyone is saying.
This may sound like a plot from a mistery movie, but the solution to this perplexing problem lies in a powerful tool known as Independent Component Analysis (ICA) [1, 2, 3]. ICA isn't just confined to unmixing tangled audio sources; it can also be used to rectify image glitches, disentangle brain signals linked to various functions, pinpoint the factors influencing stock prices, and much more. In fact, by the end of this blog post, you will clearly understand how many of these daunting tasks can be solved in practice. Most importantly, you will hear what your friends say about you behind your back :).

The ICA model
Before we explore how to untangle the voices from the recorded audio, let's first understand how the magic happens behind the scenes. Imagine you have a microphone at your party – it's like an electronic ear that turns sound into electricity. We'll call the output of this microphone
If you have multiple microphones (let's say
Now, imagine that amid the laughter and chatter, there are
To make things simpler, let's say there are just two sound sources (
Mathematically, we can represent this mixing like this:
Here, the equations say that
Now, considering Eq. (1) and (2), how can we retrieve the values of
Is it even possible to solve this problem? There are two equations but six unknowns. Can we add more equations, i.e., add more microphones and turn it into a familiar linear programming problem? Unfortunately, it's not that straightforward. Adding more microphones would give rise to even more unknowns, e.g.,
To put it simply, the independence assumption means that the sound sources operate without influencing one another. For instance, if the sound sources represent different conversations, this assumption implies that one conversation doesn't interfere with the other, which seems reasonable.
Independence
Before we delve further, let's make a subtle adjustment to our equations:
We have dropped the time dependence of
Uniformly distributed sources
Let's make these concepts more tangible. Imagine that

Now, let's think about a situation where both
Now, let's specify some numbers for
Blue point (s1,s2) = (1,0)
x1 = a11s1 + a12s2 = a11*1 + a12*0 = a11 = 2
x2 = a21s1 + a22s2 = a21*1 + a22*0 = a21 = 4
Red point (s1,s2) = (0,1)
x1 = a11s1 + a12s2 = a11*0 + a12*1 = a12 = 3
x2 = a21s1 + a22s2 = a21*0 + a22*1 = a22 = 1
Take a close look at those blue and red points; they're not just random dots. They happen to be the corners in the plot of
This means that if our sound sources were random numbers (like spinning a wheel), and we could somehow pinpoint these special corner points from our recordings, we'd be able to figure out the exact values of
Beyond Uniform Distribution
The method we explored earlier works like a charm, but there's a catch—it only works when our sound sources act like random numbers picked uniformly, just like spinning a wheel. However, in real life, it's highly unlikely that the sounds we capture at your party follow such a neat pattern. So, we need to equip ourselves with more tools to develop a general Independent Component Analysis (ICA) solver that can handle a wider range of situations.
Gaussian Distribution
In Section 2.1, we looked at the uniform distribution, which gave us a good start. But there's another important distribution in the ICA world, and it's called the Gaussian distribution. This distribution has its own unique characteristics and is quite different from the uniform distribution.
Imagine a bell-shaped curve like the one shown in Fig. 4 [Right]. That's what the probability density of a Gaussian distribution looks like. In contrast, the probability distribution of uniform distribution looks like Fig. 4 [Left]. In uniform distribution, all numbers between 0 and 1 had equal probability, therefore the same value of probability density in Fig. 4 [Left] in the interval between 0 and 1. But Gaussian distribution assigns different probabilities for different number as indicated by the values of probability density in Fig. 4 [Right]. When we take samples from a standard Gaussian distribution (similar to our uniform distribution samples in Fig. 2), they might look something like what you see in Fig. 5 [Left].
Now, let's talk about samples from
This assumption is pretty reasonable, considering that exciting signals like sound or images typically don't follow the Gaussian distribution. So, we're setting the stage to tackle the more complex cases we might encounter in the real world.
Vector Algebra
An important mathematical concept that will greatly aid our understanding of ICA is that of vectors and matrices. If you're already familiar with these concepts, feel free to skip ahead to the next sub-section. Vectors provide a concise way to represent and manipulate large sets of numbers. A
Similarly, a
A transpose of a
A multiplication between a
where
ICA model using vector algebra
Using the new vector notation, we define
We similarly define the collection of all our recordings as vector
Finally, we arrange all the coefficients,
Under the new notation Eq. (1), (2) can be re-written concisely as follows:
In fact, the above equation is unchanged even if we assume that we have a much larger number of sources and recordings (say
Now, remember our ultimate goal: to extract
we could obtain
Thankfully, there is a similar concept for matrices--the matrix inverse, denoted by
Observe that if we left-multiply
Our primary objective is to find a matrix, let's call it
Let's refer to the product of matrices
Firstly, we can't determine the exact order of the sources in
Secondly, we won't be able to recover the exact amplitude or loudness of the audio sources. In other words,
In this form,
Non-Gaussianity for ICA
In the previous section, we found that the output of our ICA algorithm, denoted as
The central limit theorem (CLT) is a fundamental concept in probability theory, stating that the sum of two or more independent random variables tends to have a distribution closer to Gaussian than a single random variable. In our context, when both
An ICA Algorithm
Various algorithms exist for estimating the ICA model, each offering a trade-off between accuracy, computational complexity, and the need for additional information. In this article, we'll delve into a widely adopted algorithm rooted in the principle of non-Gaussianity [3] introduced earlier.
To assess how closely a random variable resembles a Gaussian distribution, we can leverage unique properties of the Gaussian distribution that other distributions do not possess. For this purpose, let's define two vectors,
Consequently, we can express
One distinctive property of the Gaussian distribution is its high randomness compared to other distributions. In intuitive terms, randomness of a random variable refers to how spread out the probability density is or, equivalently, how spread out the random variable's samples are. The measure of randomness in a random variable is known as "entropy." For the purposes of this discussion, understanding the precise mathematical definition of entropy is not crucial. We will assume the existence of an oracle function
Here,
The ICA algorithm begins by initializing the elements of
This extraction process effectively separates and recovers the original sources from the mixed recordings.
Try it out !
Now that you have a basic understanding of the ICA problem and a simple algorithm to solve it, here is a python notebook (Prepared by my colleague Jiahui Song) to help get a better understanding and hands on experince: ICA Colab Notebook
Conclusion
If you have made it to this point, you hopefully understand the problem of ICA, basic assumptions for solving ICA, and a basic algorithm for solving ICA. If you are interested in exploring further and going deeper into this problem, you can explore an accessible paper that delves into the subject [3]. This understanding can also serve as a solid foundation for delving into more intricate and general tasks in the fields of machine learning and AI, including nonlinear ICA [6], latent variable models [5], disentanglement, and other key elements of the deep learning domain. Enjoy your exploration!
References
[1] Jutten, Christian, and Jeanny Herault. "Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture." Signal processing 24.1 (1991): 1-10.
[2] Comon, Pierre. "Independent component analysis, a new concept?." Signal processing 36.3 (1994): 287-314.
[3] Hyvärinen, Aapo, and Erkki Oja. "Independent component analysis: algorithms and applications." Neural networks 13.4-5 (2000): 411-430.
[4] "Central Limit Theorem." Wikipedia, Wikimedia Foundation, 25 Oct. 2023, en.wikipedia.org/wiki/Central_limit_theorem.
[5] Goodfellow, Ian, et al. "Deep learning." An MIT Press book (2016).
[6] Hyvärinen, Aapo, and Petteri Pajunen. "Nonlinear independent component analysis: Existence and uniqueness results." Neural networks 12.3 (1999): 429-439.