Our AI/ML Startup’s Tech Stack

栏目: IT技术 · 发布时间: 4年前

内容简介:Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…Hopefully, this gives you an idea of some of the rel

Some insight into how we’re building our technology.

Mar 23 ·4min read

Our AI/ML Startup’s Tech Stack

Photo Creds: Unsplash

Here’s some transparency into the technical decisions that have driven our technology. Also some insight into the things that our stack has made easy, and also some things our tech choices have made hard…

Hopefully, this gives you an idea of some of the relevant technologies in our field. From talking to fellow startup founders, this stack is pretty similar across a lot of other Machine Learning focused data teams, with some variations from industry and personal circumstance.

Overview of our stack:

Spawner API :

  • Languages: Python, C++, SQL
  • AI/ML: TensorFlow (our toolkit of choice for DL), Scikit-Learn (our go-to for most non-DL tasks)
  • Other Libraries: Pandas, Numpy, fbprophet, NLTK, scipy, ffn, pyodbc, APScheduler
  • Database: SQL Server, migration to PostgreSQL
  • Warehouse: N/A
  • ETL: Python, Airflow
  • Visualizations: Streamlit, Plotly (visualizing app performance), Altair (viz and dashboarding for new ideas), Tableau (internal business intelligence)
  • Hosting: Azure (core), Heroku (side projects & demos)
  • Tracking & SC: GitHub, Notion (keeping engineering, PMs and marketing synced up)

Spawner Portal:

  • Languages & Frameworks: (FE) React + Next.js, (BE) Python
  • Database: SQL Server
  • Hosting: Azure

Languages

We use Python for basically everything. When something that isn’t serving efficiently or wasn’t built very well isn’t keeping up, we think about converting to C++ with Python serving merely as a reference implementation. We use Python most heavily for our modeling and ETL. We’re very much a data company so of course there’s SQL everywhere.

AI/ML

We like TensorFlow for its great documentation and high number of devs with TensorFlow familiarity. Though PyTorch is starting to make some real headways, especially with all the great work Facebook Research has done recently. For now, TensorFlow is the majority of our stack, but I see nothing wrong with TF and PyTorch mingling in the future.

Scikit-Learn pops up all over the place. Its ease of use is undeniable. It’s seen in production at companies all over industry. It’s really the bread and butter of much of what we do non-deep learning that we do on the ML side.

Frontends & Frameworks

Quite frankly I’m not a frontend dev and so I won’t waste any of your time on this section. Our first hire liked Vue.js and so we went that direction originally. He thought React/Next.js made more since for another part of the codebase so that’s how that happened. We’re incredibly pleased with the work our devs have done. We love Next.js for its SEO friendliness.

Database

Our stack lives on Azure, so SQL Server seemed to make the most sense up front. From a cost and ease of use perspective, the two are obviously tightly integrated. Other than the social pressure of “you’re not using MySQL or PostgreSQL???” it’s doing everything we need, for now. We’re eyeing a potential move to PostgreSQL.

Data Warehouse

We’re slowly moving into Databricks as the volume of data grows. I’m a huge Databricks fan, also a big Snowflake fan. The two partnering up is awesome. Databricks is spectacular and I expect to bring Databricks fully online very soon.

Extract, Transform, Load (ETL)

Until recently, I was almost exclusively using Python + SQL & SQL Alchemy to do most ETL tasks. Someone I know forced me to check out Airflow and all of a sudden it’s becoming part of our stack. Scheduling workflows feels a little more natural using Airflow than stringing together cron jobs.

Tracking & Source Control

We use GitHub. Shocker.

We use Notion and I find myself using Notion for more than just project management and tracking. I use it for personal accountability and really just a technical diary. I’m able to keep track of what I do on a daily basis, make sure I’m allocating time efficiently, and track what everyone on the team is up to and where I can help out or make someone’s life easier.

Visualizations

We love Streamlit; it helps us demo models and API endpoints in very little time. I dig Plotly and Altair at the moment for their ease of use on non-public projects. Plotly gives us a bunch of flexibility and features without much effort. Altair gives us more in-depth features and customization for extra effort.

For business metrics and keeping track of revenue, churn, GA, etc we love throwing stuff into Tableau. It’s an easy way for our non-technical folks to dig straight into the analytics.

You can check out our product here.

Let’s continue the conversation on Twitter!


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

白帽子讲浏览器安全

白帽子讲浏览器安全

钱文祥 / 电子工业出版社 / 2016-3 / 79.00元

浏览器是重要的互联网入口,一旦受到漏洞攻击,将直接影响到用户的信息安全。作为攻击者有哪些攻击思路,作为用户有哪些应对手段?在《白帽子讲浏览器安全》中我们将给出解答,带你了解浏览器安全的方方面面。《白帽子讲浏览器安全》兼顾攻击者、研究者和使用者三个场景,对大部分攻击都提供了分析思路和防御方案。《白帽子讲浏览器安全》从攻击者常用技巧的“表象”深入介绍浏览器的具体实现方式,让你在知其然的情况下也知其所以......一起来看看 《白帽子讲浏览器安全》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具