On GraphQL-to-SQL

栏目: IT技术 · 发布时间: 3年前

内容简介:Looking at this query, this is the absolute most efficient way of getting the data for it. This is a really interesting idea, and it's not surprising many people are aiming for something like this :point_down:All these tools are slightly different and the

GraphQL has a reputation for its N+1 problem which can often happen when implemented naively. This leads to a lot of us trying to solve the issue of data fetching with GraphQL in the most efficient way possible. Besides the popular Dataloader approach, another very common way of tackling this problem is by taking a GraphQL query, and coming up with the exact SQL needed to resolve it:

// example from Join Monster
{                           SELECT                  
  user(id: 1) {               "user"."id",          
    idEncoded                 "user"."first_name",  
    fullName        ==>       "user"."last_name",    
    email                     "user"."email_address"
  }                         FROM "accounts" AS "user"
}                           WHERE "user"."id" = 1

Looking at this query, this is the absolute most efficient way of getting the data for it. This is a really interesting idea, and it's not surprising many people are aiming for something like this :point_down:

All these tools are slightly different and the core problem they solve is not all the same. However, they all rely a lot on compiling a GraphQL AST into a SQL statement that will hopefully be more performant than the combined SQL statements that would be executed from a naive GraphQL implementation. I see a lot of effort towards those tools, but they never really seemed to be able to solve the GraphQL data fetching issues I see day-to-day.

Some are tied to a particular style of development

Hasura and PostGraphile are really solid products, and are different from just a GraphQL-to-SQL in the sense that they aim to be a complete engine for building applications. In both these tools the database is what drives a lot of the application logic, although Hasura allows you to "stitch" your own custom resolvers with the database-backed schema and PostGraphile handles custom use cases with PostgreSQL Functions .

Honestly, both these tools are an amazing fit to get an "instant GraphQL API", especially when you don't want to deal with maintaining your own GraphQL server. That's the tradeoff here. However, for existing server applications interested in offering a GraphQL interface, it's less of the sweet spot. Same thing with Super Graph . It takes an existing database, introspects it, and generates a full GraphQL schema for it, including sortings and orderings. Useful if you want to scaffold something quickly, but definitely not that useful for large existing codebases with a lot of logic outside of the database.

Coupling

Other tools like Join Monster and SqlMancer take a slightly different approach. They don't necessarily need to introspect an existing database, decoupling the GraphQL schema definition from the database schema definition.

Tools like Join Monster (and Prisma Tools to a certain degree) need to somehow look at a GraphQL query, and decide what to fetch from the database.

For Join Monster, a requirement is that our object types map to database tables:

const User = new GraphQLObjectType({
  name: 'User',
  sqlTable: 'accounts'
  uniqueKey: 'id',
  fields: () => ({ /*...*/ })
})

The biggest hurdle for these tools is going beyond a 1:1 mapping between GraphQL fields and SQL columns. For example, how would you handle a GraphQL field name , which actually relies on logic concatenating firstName and lastName ? It would be very hard to make that happen magically, so Join Monster for example allows you to configure this behavior :

const User = new GraphQLObjectType({
  //...
  fields: () => ({
    fullName: {
      description: 'A user\'s first and last name',
      type: GraphQLString,
      // perhaps there is no 1-to-1 mapping of field to column
      // this field depends on multiple columns
      sqlDeps: [ 'first_name', 'last_name' ],
      resolve: user => `${user.first_name} ${user.last_name}`
    }
  })
})

The Join Monster API comes with quite a few constraints. Our types must map to database tables and we must know the SQL dependencies of our fields. Imagine the field PullRequest.mergeable on GitHub's GraphQL API. We can imagine this field probably needs to load data coming from git, data coming from various "checks" running on the PullRequest, data coming from different pull request reviews. Not only is that already more than a single table, but it's likely to evolve with more data needed overtime. This makes it really hard to come up with a sqlDeps configuration that would not be brittle.

It's quite difficult for an API as complex as GitHub, or really any really large application to be able to provide the exact database requirements to fulfill a use case. Sometimes data is loaded deep within existing logic, wrapped in conditionals. Not only that, but this also doesn't take into consideration application caching, and fields that make calls to other data sources or other services. Don't get me wrong, this is also really challenging with a lazy loading approach. But at least we don't have to maintain configuration for SQL at our interface layer, which does seem better to me.

Beware of tools that make this seem too easy. Join Monster is really good at what it does, but as you can see requires complex configuration to make things work outside the happy path. If the library doesn't have these configurations and only relies on naming conventions, either it's an auto-generated GraphQL API from a database schema, or it's probably too naive to handle more complex cases (Or maybe it has some AI to analyze your data dependencies, who knows :speak_no_evil:).

Is one large query always better?

Is generating a giant SQL statement always faster than a few smaller SQL statements? That's far from a new question and is obviously very hard to answer generically like this. However, we can certainly say it is not always the case . For questions like these, I always read up High Performance MySQL . I don't want to spoil the book here, but it does highlight many reasons why a giant SQL query is not always the better choice. A lot of applications often need to try "join decomposition". This gives us potentially a few advantages:

  • More potential for cache hits
  • Sometimes reduces lock contention
  • IN() queries can sometimes be faster than complex JOINS
  • And a lot more things that are explained in that book!

It's probably possible for some GraphQL-to-SQL tools to be smart enough to know how to do this, but it definitely is more of a black box than optimizing your dataloaders. All that to say that database performance is not always as simple as making one large SQL statement.

Data fetching remains something to improve

While I would not suggest most of these tools for those who have existing and complex code bases, I do think there is room for a lot of improvement with GraphQL execution. While GraphQL is datastore agnostic, the nature of its execution does impose certain constraints on how we structure application logic. Ideally, things would just work TM . Prisma client is exciting on that side of things , if it keeps getting better and better. Maybe another solution we'll see is some kind of proxy a la ProxySQL where the application can use the database without worrying about GraphQL execution, and somehow the proxy making sense of queries and making for performant ones.

If you're interested in bootstrapping a new API and you're comfortable with using the database server as the center-piece, I can only recommend Hasura and Postgraphile, they're both great. For existing codebases, and especially for large/complex ones, I'm sticking with the recommendation of using a Dataloader/Lazy/Asynchronous loading approach for GraphQL APIs, over tools that aim for ahead-of-time SQL generation.

If you've used some of these tools on existing complex codebases, I'd love to hear from you.


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

中国制造2025:产业互联网开启新工业革命

中国制造2025:产业互联网开启新工业革命

夏妍娜、赵胜 / 机械工业出版社 / 2016-2-22 / 49.00

过去20年,是中国消费互联网肆意生长的"黄金20年",诞生了诸如BAT等互联网巨头,而时至今日,风口正逐渐转向了产业互联网。互联网这一摧枯拉朽的飓风,在改造了消费服务业之后,正快速而坚定地横扫工业领域,拉开了产业互联网"关键30年"的大幕。 "中国制造2025"规划,恰是中国政府在新一轮产业革命浪潮中做出的积极举措,是在"新常态"和"供给侧改革"的背景下,强调制造业在中国经济中的基础作用,认......一起来看看 《中国制造2025:产业互联网开启新工业革命》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

html转js在线工具
html转js在线工具

html转js在线工具