Presenting Retina Spark 2.0 presents Retina Spark 2.0, an NLP tool specially designed for high performance semantic text processing in an Apache Spark environment. Similar to the Retina API, it operates on the semantic rather than the keyword level and measures the similarity in meaning between text passages in order to classify, filter and search large document repositories.

Retina Spark 2.0 enables the creation of:

  • an index of text or document semantic fingerprints to efficiently search terabytes of unstructured text data
  • a semantic classifier based on positive examples of a class
  • a semantic filter for high-throughput text streams (e.g. Twitter feed)

Retina Spark 2.0 is a library that augments Spark MLlib with high-performance semantic text processing capabilities. It is Cloudera certified and can be used with on-premise or in-the-cloud Spark clusters, including those based on the Cloudera and Amazon EMR distributions. Retina Spark 2.0 supports the latest Apache Spark releases and features a Java and Scala API.

Apache Spark is an open source framework and runtime environment for distributed and parallel computing.