First Foray into Neuroscience - Deep Image Reconstruction from Human brain activity

Paper link: Deep image reconstruction from human brain activity

Foreword: This is a very high level understanding of the paper, if you like it please share, and comment below to let me know your thoughts. 

Contribution from this paper:  * fMRI imaging has traditionally been limited only to low level images, this paper presents new ways of reconstruction that utilizes hierarchical visual features of a Deep Neural Network so that the generated images resemble the stimulus images and the subjective visual content during imagery. * Introduced a natural image prior effectively rendered semantically meaningful details to reconstructions by constraining reconstructed images to be similar to natural images.  My handwritten notes:

Read More

Explaining the Markov Property - Part 3

The Markov property:

A decision making process is said to have the Markov property when the decision is made independent of the previous states with the exception of the current state. Such a process is called the Markov Process.

For example, playing chess is a Markov Process. Because it does not matter how you got to the current board position. In order to make the next move, all you need, is not - if you have taken their pawns first, or how you got the opponent's queen - but the current board's layout. So, if the state is the board's layout, then chess is a Markov environment.

However, this is a somewhat deceiving example. Because

  • When you setup the same game differently - it can be modeled Markov or otherwise.

  • Most of the real life examples aren't 100% Markov.

  • Whether your process is Markov or not is a spectrum, not binary.

Read More

The Difference between RL and Supervised Learning - Part 2

In series one, I briefly highlighted why Reinforcement Learning is so exciting - because its actually a fundamental concept that spans across many fields. In this post, we are going to highlight the difference between RL and Supervised Learning, and find edge cases.

In David Silver's lecture series, he mentioned that the main difference between Supervised Learning and Reinforcement Learning is this:

  • Feedback in Reinforcement Learning could be delayed.
    • I believe that his reasoning is that in episodic settings, you won't know the result of your actions until the end of the episode. (When I say episodic, think Atari games, each game is an episode, and that you won't know if your actions are good actions until the end of the game when you know if you have won or lost.) Whereas in supervised learning, you know the result every batch.
    • My critique: This is kind of confusing to frame it that way. Because actually delayed feedback is common in supervised Learning. (e.g., Semi-supervised approach such as “survival analysis”: subfield of ML that mainly deals with modeling the time until the event of interest.)

Some people would say that:

  • Reinforcement Learning is a sequential decision making process, where current prediction determines future sampling space, which in turn determines future predictions.
  • My Critique: RNN is a sequence model that takes the previous action's hidden state into consideration when generating the next prediction. You could argue is some ways, sequence to sequence (RNN) [1] also fit in this bucket. So this doesn't uniquely differentiate RL from Supervised Learning. However, if you zoom out one layer - each (sample, label) pair is independent of each other in RNN settings, therefore you could view RNN non sequential between tasks.

I summarize my findings below: the main difference between Supervised Learning and Reinforcement Learning is two fold:

  • First, Feedback type. Reinforcement Learning's feedback is weaker than feedback from a supervised learninge context. i.e., valuative feedback vs. instructive feedback.. Evaluative feedback depends entirely on the action taken, whereas instructive feedback is independent of the action taken. Think of the instructive feedback as a teacher grading your math homework telling you if they are right or wrong. Think of the evaluative feedback as a cook tasting the sauce it made and it has to decide whether or not to add more salt. The second is more subjective than the first.
    • Feedback could also be sampled, non exhaustive in RL. Think of your performance on stage, you could only look at the front couple rows of people to evaluate how well you did rather than the entire audience one by one.
    • Feedback could be sequential instead of one-shot. (Note in this case, RNN's feedback is one-shot if we look at it from a sample level rather than word prediction level.)
    • Some critique might say: What about off policy? You don't necessarily decide your next action based on your current action taken. An example of which is Q-Learning: “Q-Learning estimates the return (total discounted future reward) for state-action pairs assuming a greedy policy were followed despite the fact that it could be not following a greedy policy”. You don't necessarily always take the greedy action due to the exploration steps. So the direction change could be independent of the action in some cases. But this is more of an nuanced argument because on average the off policy learning (such as QLearning) in RL is still going to be evaluating the action space from the greedy choices most of the time.
  • Secondly, Data Availability. In supervised learning, sample data is gathered independent of the model and the learning algorithm. The goal of prediction is well defined. The labels (ground truth) is known before the model is developed. However, no sample data is given in reinforcement learning prior to training. The RL algorithm gathers sample data during its training phase. Design choices such as definition of rewards (think if intaking an apple is +5 in your happiness level), optimization choices such as policy vs. Q value has direct impact to predictions of future actions. In other words, sampling space is dependent on design choices of the model and algorithm. If the sampling space is for whatever reason biased, for example, if an autonomous driving car algorithm prefers sampling highway routes, it won't be able to sample sufficient data in local roads in order to learn how to handle local traffic.

In our next series, I want to talk about this terminology graph, and go through them one by one.

RL landscape

Reinforcement Learning Isomorphisms - Part 1

This series is inspired by the Feynman learning method and is also motivated by my own personal struggle navigating the sea of information online about reinforcement learning. This will be my best effort at summarizing the field of RL. This includes a list of RL terminologies and concepts, and how they relate to each other.

Part 1 is about isomorphisms of RL across different fields. Part 2 will explain the intricacies of RL terminologies.

 The brain is an agent. The globe is the world.

The brain is an agent. The globe is the world.

  • Agent: Gets signals from the world, chooses actions, performs actions, gets a reward (or estimates a reward)
  • Environment: Takes the action from the agent, shows signals the agent can observe.
  • Internal State of an agent in an MDP: a summary of things that you need to know to decide the next action
    • Ht = A1, O1, R1, A2, O2, …. At, Ot, Rt (could be all past history)
    • Could be a function of Ht (due to the agent not having a complete memory of all the past history)
  • The difference between reward and value
    • Reward is short term intrinsic desirability from an action
    • Value is long term desirability, total amount of reward that can accumulate from an action

I will elaborate more the intricacies of the terminologies with Part 2 of this series.

 Parallel concepts!

Parallel concepts!


While this series is primarily focusing on the Machine Learning aspect of RL, it's important to highlight RL techniques' isomorphisms across other fields.

  • Neuroscience:

Neuroscientists have been studying how the brain generates behaviors for decades. At the neural level, reinforcement allows for the strengthening of synaptic associations between pathways carrying conditioned and unconditioned stimulus information. [1]  Or if you'd like to think about the reward function, loosely speaking the human system uses dopamine as our decision reward. 

  • Psychology:

Classical conditioning is learning new behaviors through a series of association (think covariant matrix and bayesian inference). Operant conditioning a learning process through which strength of behavior is modified by reward and punishment. RL is more closely related to operant conditioning. Because that's literally how you train your little agent (human or mechanical). You reward them for doing things you like so as to encourage them to repeat similar actions.

  • Economics: 

Economic agents were portrayed as fully rational Bayesian maximizers of subjective utility. However, studies have shown that the agents (us humans) aren't fully rational agents. We frequently optimize for satisfaction rather than optimality. In other words, due to our limited resources, our rationality is bounded. [2] This is a problem that is well studied in Reinforcement Learning - an agent navigating to through a maze with only limited information about the world. Also, behavior economics is an entire field that is focused on the study of how agents (either individual or organizations) make decisions, which is somewhat tangential to instructing an RL agent to make rational decisions. I'd imagine you could borrow a lot of concepts across the two fields.

  • Mathematics:

Operations Research is a field that focuses on using analytical methods to learn how to make better business decisions. How do you efficiently and accurately simulate the system so that you could perform optimizations on top of it to minimize cost, maximize reward etc.

Which business decision should you make given the business situation? (Similar to which action should you take given the signals from the environment in RL). This is a question with lots of $$ involved!

  • Engineering:

From Wikipedia: "Optimal Control is a research area where it is focused on finding a control law for a given system such that a certain optimality criterion is achieved. A control problem includes a cost function [3] that is a function of state and control variables. An optimal control is a set of differential equations describing the paths of the control variables that minimize the cost function."

In easy to understand terms, we have a set of optimizing objectives, and a set of constraints. We are trying to find the best value to assign your variables so that you maximize reward/minimize cost while at the same time satisfying your constraint.


RL is actually a very fundamental concept. Maybe it can lend hands in helping humans learn a more efficient way of living our lives. I love this weird and elegant parallel concept - because we are actually learning from machines that are learning from humans that are learning from machines that are learning from humans to be better humans and machines.

You get the idea. :P



[1]: Computational models of reinforcement learning: the role of dopamine as a reward signal

[2] What is bounded rationality?

[3] Cost Function

Stay tuned for Part 2!

Retort Against "Memorization is not learning"

Let's assume our memory (the one you use for memorization, think short term memory) has a cache size of S (S is a variable term and it changes with time.)

We encounter new information everyday. When new information comes into memory, it does a condition check -

  • If it fits our past experience/our prior, it gets converted into understanding and gets removed from our cache.
  • If they don't fit our prior, they stay in memory and sometimes gets dropped if the cache is full. Let's assume that the drop rate is K.
Read More

The Revelation

The Conundrum

Ignorance is bliss. It is believed that the more you know, the less likely it is you can be content and happy. There was a research study that I read a while ago about the relatedness of higher IQ and higher prevalence of depression. (link:, the study I read was actually a different one, this one talks about children. I will add the link here when I find the actual study)

On the other hand, it could also be argued that your happiness is completely irrelevant to how much you know about the world. Deriving your happiness from within sounds extremely ideal. People who believe this often say that happiness is a choice. It certainly sounds very nice - makes you want to believe it 100%. The reality, however, is that we are all social creatures and we need each other to coexist in this society. Therefore happiness is function dependent on variables from your environment and circumstances.

Read More


Shanghai is great. 

It is just different.

Before I left Shanghai for the US, I was a little girl being sheltered by my parents. I had curfew at 11 at night. No bars, nor clubbings are allowed without in the company of “adults”. Therefore, I have only been to bars with my mom to listen to music, which sounds pretty boring, and it WAS actually pretty boring.

Read More