Why is Cortical.io’s Retina Engine different?

Cortical.io has developed Semantic Folding, a novel form of representation of natural language based on sparse, topological, binary vectors, that overcomes the limitations of word embedding systems that use dense, algebraic models like Word2Vec or GloVe.

With Semantic Folding:

  • words, sentences and whole texts can be compared to each other
  • the computation of complex NLP operations is highly efficient
  • the system only needs small amounts of training data
  • and is easily debuggable.

The table below explains these differences in more detail:


Retina Engine
Other Word Embedding Models
Algorithm
  • sparse binary vector representation
  • topological feature arrangement enables generalization
  • dense floating point vector representation
  • independence of features can lead to false positives
Word ambiguity
  • all associated contexts are captured
  • terms can be computationally disambiguated
  • composing aggregated representations implicitly disambiguates
  • only the main sense is represented
  • other meanings interfere as noise
  • no computational disambiguation possible
Compositionality
  • atomic word-representations can be aggregated for any text size: sentences, paragraphs, documents, books etc.
  • aggregated representations can be compared to each other
  • only word-vectors OR sentence-vectors OR paragraph-vectors possible
  • vectors are not compatible with one another
Inspectability
  • semantically grounded features → easily debuggable
  • tuning by content experts
  • "blackbox effect": debugging only by trial and error
  • tuning by machine learning experts
Training Data
  • small amounts of data (high semantic payload)
  • no training data needed for classification
  • no gold-standard data needed
  • statistical encoding requires large amounts of data (low semantic payload)
  • every classifier needs individual training
  • every classifier needs its gold-standard data
Language independence
  • semantic spaces can be trained on any language
  • semantic spaces can be easily aligned by unsupervised method: enabling cross-language compatible representations
  • can be trained on any language but amount of training data might become a practical limitation
  • alignment process complex due to large amounts of training data
Computational efficiency
  • sparse binary vectors
  • small memory footprint
  • boolean operators
  • dense double/floating-point vectors
  • large memory footprint
  • complex operators
Precision
  • precision steady across use cases
  • parameters need to be optimised for every use case

Still not convinced or need more details? You can read our paper which compares Semantic Folding to other word embedding models.