Startseite
Wissenschaft
Semantic Folding

Ein neues Sprachmodell – Vom Gehirn inspiriert

Name: Semantic Fingerprinting & Semantic Folding
Brand: Cortical.io
Availability: InStock
Rating: 5 (101 reviews)

Hierarchical Temporal Memory einschließlich kortikaler Lernalgorithmen

Semantic Folding Theorie
und ihre Anwendung im Semantic Fingerprinting

Ein White Paper von Cortical.io
Autor: Francisco E. De Sousa Webber

Von den Neurowissenschaften inspiriertes Natural Language Understanding

Semantic Folding macht es möglich:

Wörter, Sätze und ganze Texte semantisch miteinander zu vergleichen

Textanalyse-Aufgaben wie Klassifizierung und semantische Suche sehr effizient durchzuführen

das System völlig unüberwacht zu trainieren

KI-Modelle mit wenig Trainingsmaterial und ohne KI-Experte zu trainieren

Ausgehend von der Hierarchical Temporal Memory (HTM)-Theorie, einer von Numenta entwickelten computergestützten Theorie des menschlichen Kortex, hat Cortical.io mit Semantic Folding eine entsprechende Theorie der Sprachrepräsentation entwickelt.

Semantic Folding beschreibt eine Methode zur Umwandlung von Text in Semantic Fingerprints. Semantic Fingerprints sind Sparse Distributed Representations (SDR) von Wörtern: große binäre Vektoren, die sehr spärlich gefüllt sind, wobei jedes Bit eine bestimmte semantische Information darstellt.

Viele praktische Probleme statistischer Natural Language Processing (NLP)-Systeme und neuerdings auch von Transformer-Modellen, wie die Notwendigkeit, große Trainingsdatensätze zu erstellen, die hohen Rechenkosten, die grundsätzliche Inkongruenz von Präzision und Recall, die komplexen Abstimmungsprozeduren usw., können durch die Anwendung von Semantic Folding auf die Textverarbeitung elegant überwunden werden.

White Paper Lesen

Semantic Folding einfach erklärt:
Sehen Sie sich ein kurzes Video an

Semantic Folding wandelt Text in Semantic Fingerprints um, die die Bedeutung in einer topografischen Darstellung speichern.

Semantic Fingerprints ermöglichen den direkten Vergleich der Bedeutungen zweier beliebiger Textstücke und zeigen Tausende von semantischen Beziehungen auf.

Wenn zwei Semantic Fingerprints ähnlich aussehen, bedeutet dies, dass die Texte auch semantisch ähnlich sind.

Mit Semantic Folding sind die semantischen Räume sprachübergreifend stabil und ermöglichen den direkten Vergleich von Texten in verschiedenen Sprachen ohne maschinelle Übersetzung.

Wie funktioniert Semantic Folding?

Zunächst wählen wir Referenzmaterial aus, das die Domäne repräsentiert, in der das System arbeiten soll - Wikipedia für Anwendungen, die allgemeines Englisch verwenden, oder domänenbezogene Dokumentensammlungen für branchenspezifische Anwendungen.

Anschließend werden die Referenzdokumente in kontextbezogene Snippets zerlegt, die über eine 2D-Matrix verteilt werden, und zwar so, dass Snippets mit ähnlichen Themen (mit vielen gemeinsamen Wörtern) auf der Karte nahe beieinander platziert werden. Auf diese Weise entsteht eine 2D-Semantikkarte.

Im nächsten Schritt wird für jedes in den Referenzdokumenten enthaltene Wort ein Vektor erstellt, indem die Positionen aller Snippets, die dieses Wort enthalten, aktiviert werden. So entsteht ein großer, binärer, sehr spärlich gefüllter Vektor, der Semantic Fingerprint genannt wird.

Ein Semantic Fingerprint ist ein Vektor von 16.384 Bits (128×128), wobei jedes Bit für einen konkreten Kontext (Thema) steht, der als "bag of words" der Trainings-Snippets an dieser Position realisiert werden kann.

Der gesamte Semantic Folding Prozess ist völlig unüberwacht.

Anwendungen von Semantic Folding

Semantic Folding bildet die Grundlage für High-Level-Funktionalitäten von Natural Language Processing, die in viele verschiedene Anwendungen integriert werden können.

Semantic Fingerprints können für Sprachelemente wie Wörter, Sätze und ganze Dokumente erstellt werden.
Es können zwei beliebige Texte verglichen werden, unabhängig von ihrer Länge oder Sprache.
Die Bedeutung von Text kann durch Messung der Überlappung von Semantic Fingerprints einfach "gerechnet" werden (je größer die Überlappung, umso ähnlicher die Texte).

Semantic Fingerprints eignen sich besonders gut für NLP-Aufgaben wie:

Klassifizierung: Anstatt den Klassifikator mit vielen annotierten Beispielen zu trainieren, kann ein Referenzfingerprint zur Beschreibung einer Klasse verwendet werden.
Semantische Suche: Semantic Fingerprints erhöhen die Genauigkeit und Effizienz von Textsuche. Das System vergleicht einfach die Überlappungen zwischen dem Semantic Fingerprint der Abfrage und dem Semantic Fingerprint der indizierten Dokumente.

Vorteile von Semantic Folding

Hohe Genauigkeit

Semantic Fingerprints nutzen einen umfangreichen Satz semantischer Merkmale mit 16k Parametern, die eine feinkörnige Disambiguierung von Wörtern und Konzepten ermöglichen.

Hohe Effizienz

Semantic Folding benötigt eine Größenordnung weniger Trainingsdokumente (Hunderte vs. Tausende) und weniger Rechenressourcen, da es spärlich verteilte Vektoren verwendet.

Hohe Transparenz und Erklärbarkeit

Jedes semantische Merkmal kann auf Dokumentenebene überprüft werden, so dass Biases in den Modellen beseitigt und Ergebnisse erklärt werden können.

Hohe Flexibilität und Skalierbarkeit

Semantic Folding kann auf jede Sprache und jeden Anwendungsfall angewendet werden, und Geschäftsanwender können die Modelle leicht anpassen.

Den Unterschied zu anderen Ansätzen entdecken

Die Zukunft der KI ist hocheffiziente KI

Video ansehen

Was ist eine Sparse Distributed Representation?

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_test_cookie	session	WordPress sets this cookie to determine whether cookies are enabled on the users' browsers.

Cookie	Duration	Description
_lscache_vary	2 days	Litespeed sets this cookie to provide the prevention of cached pages.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.

Cookie	Duration	Description
_cfuvid	session	The _cfuvid cookie is only used to allow the Cloudflare WAF to distinguish individual users who share the same IP address. Visitors who do not provide the cookie are likely to be grouped together and may not be able to access the site if there are many other visitors from the same IP address.
_gat_form_6	1 minute	This cookie is set by Google Universal Analytics and is used to throttle the request rate - limiting the collection of data on high traffic sites.
cf_clearance	1 year	Cloudfare clearance Cookie stores the proof of challenge passed. It is used to no longer issue a challenge if present. It is required to reach an origin server.
et_bloom_optin_optin_3_39_imp	1 year	Determines if the users already dismissed a specific popup.
et_bloom_optin_optin_7_2115_imp	1 year	Determines if the users already dismissed a specific popup.
etBloomCookie_optin_3	5 days	Determines if the users already dismissed a specific popup.
etBloomCookie_optin_7	5 days	Determines if the users already dismissed a specific popup.