Neural Circuits (summary & commentary)[POST 1]
Digesting OpenAI's work on the topic of neural network interpretability.
Introduction
I recently came across published thread of work out of OpenAI trying to describe and build an intuition for how neural networks (NNs) encode information in their weights and connections [LINK].
Discovering this article came from an internal Slack conversation around some work we are pursuing that tangentially touches on the problems of neural connectionism.
Primarily, we were considering the problem of how temporal and spatial associations are learned inside video. We were interested in ideas that could dissociate temporal vs pixel activations in our feature maps, similar to the way disentanglement works in generative models a la GANs or VAEs [1] [2] [3]. I was pointed to this article by one of our research engineers, when I brought up neural path activations seen in behaving animal brains, such as monkeys during reaching tasks [4].
What drew me to this article is the stated goal - deriving a systemic intuition for how learned NN filters can compose (in the functional sense [LINK]) semantic representations of their respective input data - in the OpenAI case: visual images.
The intuition that the authors propose is inspired by biological systems and draw on the creation of cellular theory and neuroscience. Motivating their work on NN interpretability by drawing the analogy to the invention of the microscope, which allowed biologists to “zoom in” and see cells.
Having a background in Neuroscience and History of Science, I was immediately drawn to the outlined connection to Schwann and his contributions to cellular theory [LINK].
Overall OpenAI’s approach to this work on circuits mirrors similar work in neuroscience. Personally, I believe there are both positive and negative outcomes of treating NNs like neuroscientists treat brains and I will try to discuss them at the end of my multi-post series.
Like Schwann’s 5 laws, the authors propose 3 conjectures for the intuition of NNs:
Claim 1: Features
Features are the fundamental unit of neural networks. They correspond to directions. These features can be rigorously studied and understood.
Claim 2: Circuits
Features are connected by weights, forming circuits. These circuits can also be rigorously studied and understood.
Claim 3: Universality
Analogous features and circuits form across models and tasks.
Claim 1: Features
At face value the statement in Claim 1 seems trivially true. Yet, reading deeper into the article, the authors extend this point to claim that these features are “meaningful” and consequently can be “studied and understood” through an anthropomorphic lens. By anthropomorphic, I mean semantic interpretable to humans, in other words a car made up of doors, wheels, and other visually distinct parts that can be combined to give a general semantic representation of a car. Or, as the authors put it “high-level features such as ears, automotives, and faces”. This idea is similar to those proposed by Hinton et al for CapsuleNets [5] that came from even earlier thoughts around Credibility Networks [6].
The authors accurately points to the existence of the polar opposite view shared by many scientists, which claims no such interpretability exists within layers.
That said, thinking through the implications of these 2 different views leads to an interesting conclusion. The first implies that from a base set of simplistic features (e.g. curve/edge/color/movement detectors) one can learn to compose large semantically coherent visual concepts such as ears or cars.
Surprisingly, this actually mirrors the anatomical/ physiological features of the mammalian visual system, notably the processing taking place in the retina and the lateral geniculate nucleus (LGN), see Fig 1.
This would imply that there exists a visual dictionary of elemental visual units that are genetically and experientially encoded in the brain and similarly learned by NNs. It is important to note, that as the visual system is further tracked into the nervous system, coherence in the information disappears and no larger semantic activators are seen directly upstream of higher visual cortices. Yet, there are certain regions of the brain that do lead to wholistic object activations - common examples are face cells, place cells, orientation/direction[7] [8] [9]. Given the long distance cross connections discovered in murine brains this does not seem to point that they do not exist, check out the MouseLight project [LINK].
If such a visual dictionary does exist, questions around the composition properties of these atomic units i.e. circuits seems a natural next step to ask.
Do they have a maximum capacity of composition, in other words can we represent any complex object as one neuron fed inputs of different semantic parts or is there a maximum semantic complexity. The lack of high level abstraction semantic neurons in the mammalian brain may be an example of this, but to my knowledge this is unknown. Consequently, what governs this maximum and how is it tied to the architecture seem to be interesting research questions.
The other hypothesis of no anthropomorphic semantic objects existing seems to lead to slightly different notion: that a visual dictionary could or could not exist but the high level boundary functions estimated by the NNs should not be interpretable by humans in a parts based symbolic constructions.
Do visual symbols need to be humanly interpretable at all? This question to me seems to be very interesting as it delves into philosophical questions of symbols, check out Godel Escher Bach [LINK] for some ideas and the (crab cannon [LINK]).
This defining characteristic is the one I find interesting to explore, but seems overlooked in the article. The notion of the symbolic interpretability of features by the human observer and whether it has any value to the estimator itself?
This is the first post of a series I will do on this publication, next one up will explore the Claim 1 features in more detail, first detailing the Inception architecture to set the stage for thinking about features.