1. Home
  2. Frequently Asked Questions Semantic Search

Frequently Asked Questions
Semantic Search

What kind of information can Cortical.io Semantic Search handle?

Any kind of structured or unstructured text, including emails, presentations, webpages, contracts, clinical studies, technical reports, handbooks, and social media posts.

Which languages are supported?

Semantic Search supports English and German. Because of the underlying technology, the functionality to search other languages can be added on request.

What are some examples results from a query to Cortical.io Semantic Search?

Example results of a search might include:

  • Documents for a search query
  • Answers to a customer query
  • Information on market competition
  • Sources of evidence within scientific literature
How does Semantic Search handle ambiguous search queries?

Cortical.io Semantic Search represents every word with roughly 16,000 semantic features known as a semantic fingerprint. This allows for very fine semantic distinctions, disambiguating terms as required for each use case. For example, the word organ would not only be made up of the sub-sense “music” or “anatomy”, but also of “church”, “composer”, and “musical instrument”.

How does Semantic Search handle alternatively phrased search queries?

Cortical.io Semantic Search can match answers with search queries that use different words. For example, “done deal” and “contract signed” would return similar results denoting the conclusion of a business agreement. This is one of the fundamental differences between keyword searches and Semantic Search.

Does Semantic Search handle queries of any length?

Yes, Cortical.io Semantic Search can process sentences, paragraphs, and documents of any length. Cortical.io Semantic Search even allows users to use a full document as a query for fast search retrieval.

How is the information indexed?

On top of regular search indices, we create and index semantic fingerprints for each document. The query is also converted into a semantic fingerprint and compared to the document fingerprints stored in the index. This allows the engine to quickly look up query terms rather than fully scan all documents at query time.

How are the search results ranked?

The search results are ranked based on a hybrid ranking model that incorporates both exact text matches and semantic fingerprint similarity. Depending on the use case, the weights of each component can be tuned, to maximize the precision of results.

What kind of file formats can Semantic Search process?

Cortical.io Semantic Search can process, among others, the following file formats: pdf, doc(x), xls(x), csv, ppt(x), html, xml, and txt. Owing to a dedicated OCR pipeline, the system can also convert scanned paper documents into searchable text.

Is Semantic Search easy to customize?

Yes. Cortical.io Semantic Search can be customized in many ways:

  • Different Retina Databases can be trained to capture different semantic worlds.
  • The ranking model can be customized, using the re-ranker to incorporate user feedback and teach the system how to behave, or adding use case-specific metadata (document categories, annotations, etc.).
  • The search results can be fine-tuned by adjusting the similarity metrics for comparing fingerprints, or by combining semantic fingerprints with full text search (document title, section title, abbreviations, content, etc.).
What kind of training material is required?

This is highly dependent on the use case. In general, Cortical.io Semantic Search should be trained on the same kind of material that it is expected to search. For example, medical textbooks if the search domain is medicine.

Cortical.io Semantic Search requires little training material, which is particularly helpful in use cases where such material is scarce (for example, in fraud detection).

How long does it take before I can use Semantic Search?

It depends on the use case and on the quantity and quality of documents to be indexed, but usually a few weeks suffice to pre-process the corpus, perform several iterations to improve results and train a custom Retina Database (if necessary).

How easy is it to keep Semantic Search up to date with new material?

Semantic Search enables you to add new documents to the index on an on-going basis, without having to stop the system. If your domain vocabulary is constantly evolving (new product names, feature names, or technical terms), the Retina database can be easily retrained (for example, every 6 or 12 months) in the updated domain vocabulary of your business.

How does Semantics Search integrate into my existing infrastructure and applications?

Semantic Search can be integrated into your existing systems and applications as a backend solution through its REST API.

How scalable is Semantic Search?

Cortical.io Semantic Search is easy to scale. Running on standard hardware makes it easy to scale out and load-balance between multiple Docker instances to support a higher number of users. Also, with our announcement of Semantic Supercomputing, we plan to make Semantic Search available on a hardware-accelerated platform.

How quickly can Semantic Search process documents?

Typically, a query takes less than one second. The search time increases for larger indices (over 500K documents) and long word queries (>100 words). Processing times can also vary considerably depending on document sizes, deployment configurations, and system resources.

How precise are the search results delivered by Semantic Search?

Using the Normalized Discounted Cumulative Gain (NDCG) method metric, the search results delivered by Cortical.io Semantic Search in a recent implementation were 25% better than competition, for keyword queries as well as for natural language questions.

Still have some questions? Contact us to get the answers!