A new model for Big Data Semantics

Cortical.io has developed a new machine learning approach inspired by the latest findings on the way the brain processes information. Semantic Folding proposes a statistics-free processing model that uses similarity as a foundation for intelligence. It breaks with traditional methods based on pure word count statistics or linguistic rule engines.

"By mimicking the understanding process of the brain, we benefit from millions of years of evolutionary engineering to help us solve the hottest NLP challenges today" explains Francisco Webber, inventor and co-founder.

Semantic Folding creates a new data representation, the Semantic Fingerprint, that encodes meaning explicitly, including all senses and contexts. The system “understands” the relatedness of two items by measuring the overlap of their fingerprints. As a result, it is very fast, reliable and easy to implement - a breakthrough technology that leverages the intelligence of the brain to enable the Natural Language Processing of Big Text Data.

At a glance

  • New machine learning approach
  • Inspired by the brain
  • Statistics-free
  • No large training data sets required
  • Converts language into semantic fingerprints
  • Measures semantic similarity of language
  • Intrinsically efficient, accurate algorithm

The Retina

  • is the central component of Semantic Folding
  • captures the essence of language
  • learns about a specific language by processing relevant text content via unsupervised learning
  • can be trained with different text collections to specialize on specific topics or language domain.

To begin with, we have produced a general purpose English Retina by ingesting the entirety of Wikipedia. New input in Wikipedia can be automatically assimilated. This guarantees that the Retina evolves along with living language and continuously captures the transformation of its socio-cultural context. We have also begun to create new Retinas for other languages (French, German, Chinese, etc…) and can demonstrate that semantic spaces are stable across languages:

Try our Cross-Lingual Topic Analyzer.

Semantic Fingerprinting

The Retina converts words into semantic fingerprints, a numerical representation that captures the meaning behind natural language:

  • Computational operations can be performed on the meaning contained within text data.
  • The Retina can generate semantic fingerprints for language elements like words, sentences and entire documents.
  • Any two pieces of text can be compared, regardless of length or language.

While traditional NLP systems are based on word frequency calculations, Cortical.io's Retina uses a substantially finer-grained representation for every word: 16,000 semantic features are captured for every term.

Read more about Semantic Fingerprinting

SDRs

Semantic fingerprints are encoded in the form of a Sparse Distributed Representation (SDR): a data structure made up of a large number of individual bits, each of which can be turned on or off. The meaning of a fingerprint is determined by the behavior of these bits, with each one contributing a small amount to the overall meaning.

According to recent findings from the field of neuroscience, this same mechanism is used by the brain to process information. In nature, SDRs encapsulate the information processed by the brain at a given moment, with each active cell bearing some semantic aspect of the overall message. The Cortical.io Retina mimicks biology in this respect, offering a fundamentally new approach to Natural Language Processing.

Read more about Sparse Distributed Representations