Any kind of structured or unstructured text data, including emails, presentations, webpages, contracts, CVs, clinical studies, technical reports, handbooks, and social media posts. The content is always identified by meaning, not keywords.
Our primary language is English. However, the functionality to search and process other languages can be added on request.
The Semantic Search Engine can process any kind of text. The engine can identify, among others, the most relevant:
- Documents for a search query
- Answers to a customer query
- Candidates for a job description
- Product recommendations based on purchase history
- Information on market competition
- Sources of evidence within scientific literature
The Semantic Search Engine represents every word with roughly 16,000 semantic features. The engine allows for very fine semantic distinctions, disambiguating terms as required for each use case. For example, the word organ would not only be made up of the sub-sense 'music' or 'anatomy', but also of 'church', 'composer', and 'musical instrument'.
All unstructured text is automatically filtered so that irrelevant text, such as generic introductions and references, and duplicate text are removed. The Semantic Search Engine identifies variations of this type of text throughout the data without requiring exact text searches for each variation.
The Semantic Search Engine can match answers with search queries that use different words. For example, done deal and contract signed would be mapped to the same meaning to denote the conclusion of a business agreement.
Out of the activated semantic features for each of these expressions, a certain percentage of overlap is identified. By measuring this semantic overlap, the Semantic Search Engine understands that both expressions are related and should be mapped to the same meaning within the semantic space.
Yes, the engine can process sentences, paragraphs, and documents of any length. Generally, the longer the query, the clearer the context and the more accurate the search results.
The information is stored as semantic fingerprints whereby all terms are mapped to the documents in which they appear to build an inverted index. The Semantic Search Engine converts a query—a word, paragraph, or document—into a semantic fingerprint and compares the query fingerprint to the document fingerprints stored in the index. This allows the engine to quickly look up query terms (once the index is computed) rather than fully scan all documents at query time.
The Semantic Search Engine can handle any text query—a word, paragraph, or document. Cortical.io can customize the engine to process text documents in any format (for example, .pdf, .doc(x), .xls(x), .csv, .ppt(x), .html, .xml, and .txt).
Note: Audio, video, and image content (except for scanned paper documents) cannot be processed, and numbers in text documents are not converted into semantic fingerprints.
The search results are ranked by their semantic fingerprint similarity. Depending on the use case, fine-grained similarity scores can also be used to compare across different document sections (for example, title, body text, and metadata).
The Semantic Search Engine can process, among others, the following file formats: .pdf, .doc(x), .xls(x), .csv, .ppt(x), .html, .xml, and .txt. Owing to a dedicated OCR pipeline, the engine can also convert scanned paper documents into searchable text.
The Semantic Search Engine expands a text query by matching the query automatically with both exact and approximate results. Custom functionalities can also be added to meet the particular needs of your use case.
As the Semantic Search Engine can be integrated into your existing system, you can change how the search results are displayed in your user interface and adjust the similarity metrics for comparing fingerprints. Cortical.io can also adapt the engine to your use case, for example, by adding components to filter out search results.
This is highly dependent on the use case. In general, the Semantic Search Engine should be trained on the same kind of material that the engine is expected to search and process. For example, the engine is supposed to be trained on emails to search and identify relevant information in other emails.
The engine requires little training material, which is particularly helpful in use cases where such material is scarce (for example, in fraud detection).
The engine training takes only a couple of hours. Some manual fine tuning is usually required to improve the quality of search results.
No, the Semantic Search Engine learns the vocabulary of your company's business domain by analyzing a corpus of relevant information sources—for example, emails, presentations, webpages, contracts, CVs, clinical studies, technical reports, handbooks, and social media posts—in an unsupervised machine-learning approach.
It takes only a few days to get a fully functioning Semantic Search Engine. Collaborating with subject-matter experts to update the engine might be necessary for more accurate search results.
As new product names, feature names, and other technical terms enter the jargon of your business domain, the Semantic Search Engine can be easily retrained (for example, every 6 or 12 months) in the updated domain vocabulary of your business. Retraining can be done in parallel with the normal functioning of the engine and takes only a few hours.
The engine can be integrated into your existing system as a back-end solution through its REST API.
Yes, the engine can in principle be connected to applications like Salesforce, SAP, and SharePoint through its REST API.
The search results can be exported as relational databases and viewed in business intelligence solutions like Tableau.
The engine is easy to scale. We can switch to a more efficient server and/or CPU for more processing power or load-balance between multiple Docker instances to support a higher number of users.
The Semantic Search Engine can index over a million documents in up to 40 seconds. It is also possible to retrieve more than 100 search results in over a million documents in up to 0.2 second. Using inverted indexing, the engine can quickly look up query terms (once the index is computed) rather than fully scan all documents at query time.
Note: Processing times can vary considerably depending on document sizes, deployment configurations, and system resources.
The engine can be installed on your own server—on your company’s premises or in your private cloud—or a third-party server. Third-party cloud production environments are currently operating on Google Compute Engine (GCE) and Amazon Web Services (AWS) instances.
- Standalone JVM distribution, JRE version 8+
- Docker Engine
Minimum system resources
For a single instance of the engine:
- 8 GB RAM
- 1 core
SSD space requirements are negligible.
Still have some questions? Contact us to get the answers!