The flop — In the eye of the (AI) beholder

5 min readMar 1, 2022

Another intense month of R&D has passed and George now has vision as our poker game is progressing! Let’s start with bookkeeping — price indices are stuck in the pipeline as OpenSea continues to do (code-breaking!) updates to their API. This unfortunate development limits progress on our side additions to George. For now, we’ll keep it on hold and will explore alternative data providers once done with the development of the key features. Similarly, George’s chatting ability is on hold, for now, to stay streamlined with pricing aspects.

A wise man once said something along the lines ‘if you can’t explain it in simple words — means you don’t understand it well.’ The statement is especially true when it comes to applied research where ideas are framed into code. Time to break down our vision problem into smaller components.

Let’s focus on a handful of assets from several collections — Azuki, Bored Ape Yacht Club, Cryptopunks, Doodles, and World of Women (by column above). The first thing we observe is a meaningful variation of visual features across collections (pick a row and go along it) with limited vertical variation within each collection. Cryptopunks show the least amount of variation of them all. But how do we (none-AI observers) detect this variation? The most obvious difference is due to color changes (pixel variation) but clearly distinct styles and intricate shapes (features) play a crucial part as well. The first NFT from Azuki has a clearly similar background to the first BAYC NFT (pixel variation is small as backgrounds are closely matching) yet it’s unlikely to be the key driving factor for pricing. This example is especially true for Cryptopunks. At the same time, only BAYC NFTs have ‘loopy’ looking ears. So how do we measure these features in AI / ML words? By constructing feature vectors of course.

Let’s dive straight into the problem by loading up images and splitting them on the RBG channels — the direct approach. This yields a large number of features with a low signal-to-noise ratio. The most obvious approach is to compare the angular similarity of feature vectors in order to determine their ability to differentiate two NFTs apart. The plot below shows the direct method has limited ability to differentiate between and within collections. Most of the orange values observed have especially high readings in the upper 90s suggesting that around 90%+ of the data within each vector describing an NFT is the same.

A more refined approach is to focus on the gradient of each color layer. In this case, the background color will have limited usefulness as they do not vary that much. This approach (gradient, blue above) produces more compact features with a slightly lower noise-to-signal ratio. As expected, similarity to itself is at 100% while similarity to other NFTs varies within 60% to 80% region. Yet we still don’t observe the pattern we intuitively identified — within collection variation to be lower (high similarity) while across collection higher (lower similarity). This is not going to cut it!

Time to pull out big boys and get measurements from a uniquely different perspective. The explosion of cheap and available processing power lead to major improvements in the quality of neural networks. Usual training of the supervised classification problem is time and computing-intensive but they do have one unique feature — transfer learning. A well-trained network to solve a classic classification problem can be reapplied on a different problem and will be able to leverage what it has learned in the previous case. Intuitively, a low signal and ample features space that we have seen in the direct RBG decomposition is filtered out to focus on key areas within the picture to determine what is being shown. If we take one step further and focus on the ’almost’ final output — we’re capturing massively higher signal features.

Let’s slice and dice a trained model from VGG19. We are looking at feature vectors produced by a modified and trained VGG19 model applied to our selected NFTs. The difference between a dog picture and an NFT for the model is limited as it’s built as a deep convolution model for large-scale images. While design choices are important, the example below shows the desired feature:

within the collection, variation tends to be focused on more-or-less same features
cross-collection variation happens on different features

Heat-map below can be interpreted as an MRI picture of the brain when an individual observes an NFT. Certain areas are activated for each picture as we observe different features and they trigger subtle brain pathways. The same is happening for the model here, where a wide set of stacked matrices are supplied and they trigger only certain pathways.

At the same time, a slightly different framework (below) will yield variation in different areas of features (as they are not ordered) but will translate to similar patterns (above and below pictures). Below is the earlier version of the neural net known as VGG16. Overall VGG16 appears to be extracting less compressed features with a higher variation — a desired feature in a big data setting.

At the same time, if we modify ResNet50 for our problem, we get a slightly different variation of the same solution (below). Variation is more compressed within collections compared to previous cases but deviations are even further higher. What matters the most — we are replicating the previously identified patterns within and outside collection variation happening in different areas.

Despite the short February, George is now able to interpret what he sees on the visual spectrum into feature vectors understandable for AI / ML. Vector variation closely matches what we see intuitively reassuring the right direction we are taking. The precise details of how we go from the examples above to pricing are something of know-how and our house-specialty spice blend ;)

Over March, we’ll be working out how to attribute variation in visual features within and outside of each collection to the probability of NFT being liquid (trades more than twice) and corresponding likely transaction price. A natural extension of moving away from February price indices on overall collections into finer details. Stay tuned and keep safe in these uncertain times!

The flop — In the eye of the (AI) beholder

Written by trustNFT