Cortical.io has developed a new machine learning approach inspired by the latest findings on the way the brain processes information. Semantic Folding proposes a statistics-free processing model that uses similarity as a foundation for intelligence. It breaks with traditional methods based on pure word count statistics or linguistic rule engines.
"By mimicking the understanding process of the brain, we benefit from millions of years of evolutionary engineering to help us solve the hottest NLP challenges today" explains Francisco Webber, inventor and co-founder.
Semantic Folding creates a new data representation, the Semantic Fingerprint, that encodes meaning explicitly, including all senses and contexts. The system “understands” the relatedness of two items by measuring the overlap of their fingerprints. As a result, it is very fast, reliable and easy to implement - a breakthrough technology that leverages the intelligence of the brain to enable the Natural Language Processing of Big Text Data.
To begin with, we have produced a general purpose English Retina by ingesting the entirety of Wikipedia. New input in Wikipedia can be automatically assimilated. This guarantees that the Retina evolves along with living language and continuously captures the transformation of its socio-cultural context. We have also begun to create new Retinas for other languages (French, German, Chinese, etc…) and can demonstrate that semantic spaces are stable across languages:
The Retina converts words into semantic fingerprints, a numerical representation that captures the meaning behind natural language:
While traditional NLP systems are based on word frequency calculations, Cortical.io's Retina uses a substantially finer-grained representation for every word: 16,000 semantic features are captured for every term.
Semantic fingerprints are encoded in the form of a Sparse Distributed Representation (SDR): a data structure made up of a large number of individual bits, each of which can be turned on or off. The meaning of a fingerprint is determined by the behavior of these bits, with each one contributing a small amount to the overall meaning.
According to recent findings from the field of neuroscience, this same mechanism is used by the brain to process information. In nature, SDRs encapsulate the information processed by the brain at a given moment, with each active cell bearing some semantic aspect of the overall message. The Cortical.io Retina mimicks biology in this respect, offering a fundamentally new approach to Natural Language Processing.