Foreword: This is a very high level understanding of the paper, if you like it please share, and comment below to let me know your thoughts.
Contribution from this paper: * fMRI imaging has traditionally been limited only to low level images, this paper presents new ways of reconstruction that utilizes hierarchical visual features of a Deep Neural Network so that the generated images resemble the stimulus images and the subjective visual content during imagery. * Introduced a natural image prior effectively rendered semantically meaningful details to reconstructions by constraining reconstructed images to be similar to natural images. My handwritten notes:
Here is the workflow:
- The examiner places the subjects under fMRI machines, and showed them an image X
- We then take f(X), where f is the VGG19 model and we generate a list of features for input X
- The examiner than takes the fMRI output, let's call it fMRI(X) and train another network (decoder network, g) using linear regression models. They used Sparse Linear Regression algorithm (SLR; Bishop 2006), which can automatically select important voxels for decoding. [Automatic Relevance Determination prior]
- The examiner then finds a new image X* such that equation 1 from above is minimized. V is the output of the fMRI pattern.
- Optionally, we can combine a Generator as an input to f, X* could be generated either by doing pixel value optimization or through a Generator from a GAN.
Here are some comparisons of the generated image X* and the original image:
Contribution 1 - "The benefit of being Deep":
According to the paper, the depth really helps with recreating similar semantic information from the generated images.
The subjective assessment showed higher rates of choice with reconstructions from a larger number of DNN layers, suggesting the improvement of reconstruction quality by combining multiple levels of visual features.
See figure below:
Contribution 2 - GDN:
- The GAN acts as an natural-image prior, so that the search space is more restricted.
- Using GDN, we are able to render semantically meaningful details
- Compare this figure with the one above as Supplementary Figure 5 (without DGN), when this one is generated using DGN. As we can see, it is showing more semantical
For comparison between with and without DGN:
Others Finding & Interesting things:
Luminance Information Missing
The above image showed that the luminance contrast of a reconstruction was often reversed, presumably because of the lack of absolute luminance information in fMRI signals even in the early visual areas. Additional analysis revealed that feature values of filters with high luminance contrast in the earliest DNN layers were better decoded when they were converted to absolute values, demonstrating a clear discrepancy of fMRI signals and raw DNN signals.
Both methods compare the reconstructed image with both the original image and a randomly selected image from the pool.
- Objective Assessment: Using Pearson Correlation
- Subjective Assessment: Asking humans which of the two is more similar to the generated image
Generalization to Artificial Shapes
Since both DNN and decoding models are solely trained on natural images, this becomes challenging. My first question when I reached this part was: “why didn't they train their own model?” Then I realized that I am reading a neuroscience paper rather than machine learning paper. So many of the models that they use are actually off the shelf pre-trained model.
The results show that artificial shapes are reconstructed with moderate accuracy, indicating that the reconstruction generalizes.
They said that evaluations by human judgement suggest that shapes were better reconstructed from early visual areas, whereas colors were better reconstructed from mid-level visual areas.
They also have a high reconstruction accuracy on alphabetical letters.
Finally, the most exciting part
When it comes to reconstructing the imaginary content. Higher than chance accuracy from evaluation by humans. But from the look of it, it is still not performing so well. Possible failure modes:
- Difficult to imagine complex natural images (Fig 4b)
- Possible disagreement due to possibles, colors and luminance between target and reconstructed images.
As we can see from the above "original image" and generated image, there is not very much similarity. The paper claimed that it has higher than random accuracy. :P
Summary of this summary:
- Similar to the idea that startups can benefit from a lot of APIs as was mentioned here: https://www.facebook.com/deeplearnerslearn/videos/143981816296629, it seemed like non CS papers use lots of pretrained blackbox models without delving much into training the models themselves.
- As we can tell from the last image, we are still pretty far away from being able to reconstruct imaginary content from just looking at the fMRI data.