Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

栏目: IT技术 · 发布时间: 2个月前



Data Science , Fusion Tips , Lucidworks Fusion , Reference Materials , Technical Article

Deploying Custom Data Science Models with Lucidworks Fusion

Learn how Lucidworks Fusion helps teams reduce friction and accelerate the velocity of data science.

bySanket Shahane on April 3, 2020

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Whether in finance, retail, healthcare, or oil and gas, data science and machine learning are pervasive across all domains and business processes. However, there is no “global” ML solution that works for all problems. 

Data science teams are continuously adapting new frameworks and methods to solve challenges in the best possible way. This creates pressure on engineering and DevOps teams to be able to serve the latest solutions, with friction at hand-off points and potentially higher technical debt. The biggest challenge faced today by these technical experts is taking ML models to production quickly in the context of a fully functional and performant search application. 

This post will cover in detail how Lucidworks Fusion reduces the friction of deploying custom machine learning models but if you’d like to see these tools in action be sure to also sign up for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1 .

Machine Learning in Retail and Enterprise Search

Retail search and enterprise document discovery applications use data science as an important ingredient for personalizing mission critical applications. Simple keyword matching is no longer enough to satisfy today’s users. Semantic Search applied toward product recommendations, user query understanding, document categorization, sentiment analysis, and summarization are critical to providing enhanced, personalized experiences to consumers as well as employees. As data science teams strive to build models that satisfy requirements of new-gen users, having the ability to smoothly take those models into production is becoming critical. 

Data Science Toolkit Integration in Lucidworks Fusion

Lucidworks Fusion is a cloud-native, scalable enterprise document discovery platform built with openness and pluggability at its core. Fusion seamlessly integrates with a variety of commercial and open source machine learning frameworks to derive insights from large unstructured documents. Use cases vary from e-commerce search applications, to conversational frameworks, to support portals and internal enterprise knowledge discovery applications.

Fusion’s Data Science Toolkit Integration is a model service that provides seamless integration with query and index pipelines to add intelligence for processessing incoming queries and documents. Fusion integrates with Seldon Core , an open source framework for model deployment management. Fusion’s Data Science Toolkit Integration enables data science teams to develop and validate models built for specific data and use Fusion to deploy them in production. This capability helps teams to:

  • Streamline production of search-focused ML models 
  • Reduce data science teams dependencies on DevOps teams and vice versa
  • Increase productivity, drive experimentation to fail fast, iterate, and improve

Deployment and Consumption Workflow

Data Science teams will 

  • build
  • validate models for organizations problems, 
  • convert them to versioned docker images and 
  • register with Fusion to deploy

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

The diagram above describes a typical data science team’s workflow. The team first identifies the problem, takes data from various storages, uses Jupyter notebooks with Python ML libraries and performs iterations until a satisfactory version is produced. After that, uses simple commands build a docker image and  publish to Fusion. Fusion needs one-time access setup to the organization’s private docker repository to register the image. Fusion can then deploy the models on demand at scale.

Using models at Query (search) and Index (data ingest) time

Case 1: Processing documents at index time.

When indexing documents from Sharepoint, GDrive or any other data source, Machine Learning models can enrich the document with Entities, Summary, Topics, Sentiment Scores etc.

Documents pass through the following flow: Fusion Connectors →  Index Pipeline → Solr Index  

Fusion’s Machine Learning Index Stage will interact with deployed ML models and pass documents/predicts back and forth between the pipeline and Seldon core.

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

The image above describes how the documents flow through different stages in an index pipeline getting enriched at each step before being stored. The Machine Learning Index Stage interacts with Fusion’s ML Service which then talks to Seldon Core. Seldon Core routes the requests to the respective models while load-balancing between model replicas. Finally the prediction from the model is returned back to the pipeline and the document is enriched with that prediction. Model replicas are copies of Model Docker images deployed to increase scalability.

Case 2: Processing User Queries

When processing user queries in real time (from the search front end either ecommerce website or internal knowledge discovery portal) queries can be passed through ML models to predict various user intent attributes such as, brand affinity, product category for the query is looking for, expected color etc.

Queries pass from: Front End →  Query Pipeline → Solr Index. → Response → Front End

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

The diagram above shows how a user query travels through a Fusion query pipeline and the Machine Learning Query Stage interacts with Deployed ML Models passing queries/predictions back and forth between the pipeline and Seldon Core. The predictions can then be used as Solr Boost or Filter parameters. E.g. A model can predict department:electronics for query “ipad”.

Case 3: Post Processing search results at query time.

Responses to user queries, from the Fusion backend can also be modified to alter the ranking of the results, redact certain documents etc. to promote personally relevant results based on user information, show documents based on semantic similarity in addition to keyword search.

Queries pass through the following workflow: Front End →  Query Pipeline → Solr Index

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Machine Learning Query Stage will interact with Deployed ML Models and pass Response documents/predictions back and forth between the pipeline and Seldon core. The re-ranked / altered results  can then be passed on to the front end. E.g. Certain models that do this are popularly known as LTR models (learning to rank) .

Lucidworks Models

Lucidworks has deployed multiple Deep Learning based ML Models on this framework, available for Fusion users out of the box.

  1. Sentiment analysis small text
  2. Sentiment analysis large text
  3. Semantic search apps
    1. Smart Answers (coming soon)
    2. Zero search results treatment (coming soon)

See Fusion in Action 

If you want to learn more and see Fusion’s capabilities for data science in action, register for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1 .


以上所述就是小编给大家介绍的《Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!






R for Data Science

R for Data Science

Hadley Wickham、Garrett Grolemund / O'Reilly Media / 2016-12-25 / USD 39.99一起来看看 《R for Data Science》 这本书的介绍吧!


RGB HEX 互转工具