Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

栏目: IT技术 · 发布时间: 6年前

Data Science , Fusion Tips , Lucidworks Fusion , Reference Materials , Technical Article

Deploying Custom Data Science Models with Lucidworks Fusion

Learn how Lucidworks Fusion helps teams reduce friction and accelerate the velocity of data science.

bySanket Shahane on April 3, 2020

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Lucidworks Managed Search: Same Solr, Fewer Chores

Marcus Eagan

Fusion 5.1 Is Here: Faster Deployment of Data Science and Innovation

Alea Abed

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Whether in finance, retail, healthcare, or oil and gas, data science and machine learning are pervasive across all domains and business processes. However, there is no “global” ML solution that works for all problems.

Data science teams are continuously adapting new frameworks and methods to solve challenges in the best possible way. This creates pressure on engineering and DevOps teams to be able to serve the latest solutions, with friction at hand-off points and potentially higher technical debt. The biggest challenge faced today by these technical experts is taking ML models to production quickly in the context of a fully functional and performant search application.

This post will cover in detail how Lucidworks Fusion reduces the friction of deploying custom machine learning models but if you’d like to see these tools in action be sure to also sign up for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1 .

Machine Learning in Retail and Enterprise Search

Retail search and enterprise document discovery applications use data science as an important ingredient for personalizing mission critical applications. Simple keyword matching is no longer enough to satisfy today’s users. Semantic Search applied toward product recommendations, user query understanding, document categorization, sentiment analysis, and summarization are critical to providing enhanced, personalized experiences to consumers as well as employees. As data science teams strive to build models that satisfy requirements of new-gen users, having the ability to smoothly take those models into production is becoming critical.

Data Science Toolkit Integration in Lucidworks Fusion

Lucidworks Fusion is a cloud-native, scalable enterprise document discovery platform built with openness and pluggability at its core. Fusion seamlessly integrates with a variety of commercial and open source machine learning frameworks to derive insights from large unstructured documents. Use cases vary from e-commerce search applications, to conversational frameworks, to support portals and internal enterprise knowledge discovery applications.

Fusion’s Data Science Toolkit Integration is a model service that provides seamless integration with query and index pipelines to add intelligence for processessing incoming queries and documents. Fusion integrates with Seldon Core , an open source framework for model deployment management. Fusion’s Data Science Toolkit Integration enables data science teams to develop and validate models built for specific data and use Fusion to deploy them in production. This capability helps teams to:

Streamline production of search-focused ML models
Reduce data science teams dependencies on DevOps teams and vice versa
Increase productivity, drive experimentation to fail fast, iterate, and improve

Deployment and Consumption Workflow

Data Science teams will

build
validate models for organizations problems,
convert them to versioned docker images and
register with Fusion to deploy

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

The diagram above describes a typical data science team’s workflow. The team first identifies the problem, takes data from various storages, uses Jupyter notebooks with Python ML libraries and performs iterations until a satisfactory version is produced. After that, uses simple commands build a docker image and publish to Fusion. Fusion needs one-time access setup to the organization’s private docker repository to register the image. Fusion can then deploy the models on demand at scale.

Using models at Query (search) and Index (data ingest) time

Case 1: Processing documents at index time.

When indexing documents from Sharepoint, GDrive or any other data source, Machine Learning models can enrich the document with Entities, Summary, Topics, Sentiment Scores etc.

Documents pass through the following flow: Fusion Connectors → Index Pipeline → Solr Index

Fusion’s Machine Learning Index Stage will interact with deployed ML models and pass documents/predicts back and forth between the pipeline and Seldon core.

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

The image above describes how the documents flow through different stages in an index pipeline getting enriched at each step before being stored. The Machine Learning Index Stage interacts with Fusion’s ML Service which then talks to Seldon Core. Seldon Core routes the requests to the respective models while load-balancing between model replicas. Finally the prediction from the model is returned back to the pipeline and the document is enriched with that prediction. Model replicas are copies of Model Docker images deployed to increase scalability.

Case 2: Processing User Queries

When processing user queries in real time (from the search front end either ecommerce website or internal knowledge discovery portal) queries can be passed through ML models to predict various user intent attributes such as, brand affinity, product category for the query is looking for, expected color etc.

Queries pass from: Front End → Query Pipeline → Solr Index. → Response → Front End

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

The diagram above shows how a user query travels through a Fusion query pipeline and the Machine Learning Query Stage interacts with Deployed ML Models passing queries/predictions back and forth between the pipeline and Seldon Core. The predictions can then be used as Solr Boost or Filter parameters. E.g. A model can predict department:electronics for query “ipad”.

Case 3: Post Processing search results at query time.

Responses to user queries, from the Fusion backend can also be modified to alter the ranking of the results, redact certain documents etc. to promote personally relevant results based on user information, show documents based on semantic similarity in addition to keyword search.

Queries pass through the following workflow: Front End → Query Pipeline → Solr Index

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Machine Learning Query Stage will interact with Deployed ML Models and pass Response documents/predictions back and forth between the pipeline and Seldon core. The re-ranked / altered results can then be passed on to the front end. E.g. Certain models that do this are popularly known as LTR models (learning to rank) .

Lucidworks Models

Lucidworks has deployed multiple Deep Learning based ML Models on this framework, available for Fusion users out of the box.

Sentiment analysis small text
Sentiment analysis large text
Semantic search apps
1. Smart Answers (coming soon)
2. Zero search results treatment (coming soon)

See Fusion in Action

If you want to learn more and see Fusion’s capabilities for data science in action, register for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1 .

以上所述就是小编给大家介绍的《Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

人月神话（英文版）

[美] Frederick P. Brooks, Jr. / 人民邮电出版社 / 2010-8 / 29.00元

本书内容源于作者Brooks在IBM公司任System/360计算机系列以及其庞大的软件系统OS/360项目经理时的实践经验。在本书中，Brooks为人们管理复杂项目提供了最具洞察力的见解，既有很多发人深省的观点，又有大量软件工程的实践，为每个复杂项目的管理者给出了自己的真知灼见。大型编程项目深受由于人力划分产生的管理问题的困扰，保持产品本身的概念完整性是一个至关重要的需求。本书探索了达成......一起来看看《人月神话（英文版）》这本书的介绍吧!

码农工具

Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Deploying Custom Data Science Models with Lucidworks Fusion

Related Articles

Machine Learning in Retail and Enterprise Search

Data Science Toolkit Integration in Lucidworks Fusion

Deployment and Consumption Workflow

Using models at Query (search) and Index (data ingest) time

Case 1: Processing documents at index time.

Lucidworks Models

See Fusion in Action

人月神话（英文版）

随机密码生成器

URL 编码/解码

SHA 加密