Lexicographers have a hard time these days. With the rise of social media, they are confronted with the appearance of new terms almost in real time. As they observe search trends in online dictionaries, they must react quickly. Some of these new terms are easy to define – bitcoin, blockchain, phishing: articles that deliver contextual information are surfacing across the net like mushrooms on the damp soil of a forest, helping lexicographers to capture the various semantic flavors into a set of definitions. But what about words coined by a pundit in the frenzy of a tweet, words that leave you wondering whether they are just a misspelling, or a genuine word creation? Should lexicographers ignore them, despite the ten thousands of lookups?

What makes a word worthy of being added to a dictionary?

Its frequency of use or the interest it generates in people? The lexical experts behind Dictionary.com lean towards the latter criterion.

In a business context, the fact that language evolves rapidly – new vocabulary appearing, meanings shifting, words becoming obsolete – constitutes a technical challenge that has significant economic implications.

Chief information officers want to provide their companies with a competitive edge by mining the mountains of data they have accumulated like squirrels before the winter. To this purpose, computers are patiently trained to recognize patterns and provide insights, at a huge cost. The problem is, the training material captures the state of language at a given instant. Soon after the system has been trained, it might well be that the material has become, at least partially, obsolete. The system will perform poorly until retrained.

In state-of-the-art machine learning approaches, batch learning techniques predominate. When new vocabulary is to be added, machines are retrained with completely new data sets. This is mainly because efforts to adapt to dynamic environments via online machine learning often suffer from a phenomenon called catastrophic forgetting: the system forgets everything it already learned when it is trained with new data. The Retina Engine needs only a few hours to incorporate new training material. But other machine learning approaches take days or even weeks to process the whole corpus again. Consequently, one of the big questions data scientists struggle with is:

When and how should my model be adapted?

Now, think about it for a second. When coming across new pieces of information, do you ask yourself whether you should assimilate this new knowledge now, later or not at all? I guess not. You probably just take it and move on. It is an intuitive process, a process that characterizes natural intelligence.

What if machines could learn in the same elegant, synthetic way as our brain?

What sounds like a dream might not be impossible, as we discovered in a recent project. Our client, a Fortune 100 company with support call centers all over the world, employs hundreds of agents, who receive thousands of support requests from customers every day. The agents’ main goal is to solve as quickly as possible each of the support requests to ensure highest customer satisfaction. They try to leverage the information contained in previous support cases to quickly solve the new requests, but their system often delivers inaccurate results: the search engine is confused by differences between terminology used by customers to describe problems and that used by engineers to describe solutions. It does not understand terms it has never seen before. The IT department makes a huge effort to update the system, retraining it on a regular basis, adapting it to new vocabulary, but the approach not only swallows millions of dollars, it is also of limited benefit: more new terms appear very soon after the system has been retrained.

When this company incorporated the Cortical.io Retina technology into its workflow, the engineers discovered a system that enables new vocabulary to be automatically identified even during runtime, without the whole model having to be retrained. This addition of new vocabulary is very simple, mirroring the way we, humans, assimilate terms. As documents are added or updated, the occurrence of each new term is recorded together with its surrounding contexts. When enough occurrences of a term have been recorded, the Retina Engine uses the associated contexts to numerically encode the meaning of the term. The encoding is called the term’s semantic fingerprint, and, since the fingerprint is created during run-time instead of during one of the periodic model-training sessions, it is known as a provisional semantic fingerprint. Provisional fingerprints can be used as good approximations of the meaning of new terms until the system is retrained.

There are many good reasons why this major multinational company decided to deploy Cortical.io’s Retina technology. The fact that the time needed by call center agents to solve a support request was reduced by nearly 70% is certainly not the least. But I’d say that the system ability to adapt smoothly to a changing environment, because it comes so near to true intelligence, most impressed them.

Our world, like our language, is constantly evolving. To add real value to our own intelligence, intelligent machines won’t be able to succeed without two fundamental attributes: versatility and adaptiveness.

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wordpress_test_cookie	session	WordPress sets this cookie to determine whether cookies are enabled on the users' browsers.

Cookie	Duration	Description
_lscache_vary	2 days	Litespeed sets this cookie to provide the prevention of cached pages.
li_gc	6 months	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.

Cookie	Duration	Description
_cfuvid	session	The _cfuvid cookie is only used to allow the Cloudflare WAF to distinguish individual users who share the same IP address. Visitors who do not provide the cookie are likely to be grouped together and may not be able to access the site if there are many other visitors from the same IP address.
_gat_form_6	1 minute	This cookie is set by Google Universal Analytics and is used to throttle the request rate - limiting the collection of data on high traffic sites.
cf_clearance	1 year	Cloudfare clearance Cookie stores the proof of challenge passed. It is used to no longer issue a challenge if present. It is required to reach an origin server.
et_bloom_optin_optin_3_39_imp	1 year	Determines if the users already dismissed a specific popup.
et_bloom_optin_optin_7_2115_imp	1 year	Determines if the users already dismissed a specific popup.
etBloomCookie_optin_3	5 days	Determines if the users already dismissed a specific popup.
etBloomCookie_optin_7	5 days	Determines if the users already dismissed a specific popup.

Why intelligent machines must be versatile

Recent Posts

Stay informed!

Subscribe to our newsletter to keep track of what happens at Cortical.io.

You have Successfully Subscribed!