Phoenix解读 | Phoenix源码解读之SQL

栏目: 数据库 · 发布时间: 5年前

内容简介：SQL编译执行上一章我们自底向上简要分析了 SQL 编译的过程，这一章从 Driver 分析。用过 JDBC 进行数据库的 CRUD 的读者对：

SQL编译执行

上一章我们自底向上简要分析了 SQL 编译的过程，这一章从 Driver 分析。

用过 JDBC 进行数据库的 CRUD 的读者对：

Driver
Connection
PreperedStatement/Statement
ResultSet

一定不陌生，对应的 PhoenixSQL 中也有这些概念。我们会自顶向下逐一分析这些概念。

/**
  * 
  * JDBC Driver implementation of Phoenix for production.
  * To use this driver, specify the following URL:
  *     jdbc:phoenix:<zookeeper quorum server name>;
  * Only an embedded driver is currently supported (Phoenix client
  * runs in the same JVM as the driver). Connections are lightweight
  * and are not pooled. The last part of the URL, the hbase zookeeper
  * quorum server name, determines the hbase cluster to which queries
  * will be routed.
  */
public final class PhoenixDriver extends PhoenixEmbeddedDriver {
       ...
}

/**
  * 
  * Abstract base class for JDBC Driver implementation of Phoenix
  */
@Immutable
public abstract class PhoenixEmbeddedDriver implements Driver, SQLCloseable {
       ...
}

PhoenixDriver 就是 Driver 的一个子类，提供了相应的功能实现。不过目前只需要关注 connect 这个方法，它最终返回 PhoenixConnection 。

PhoenixDriver 注释中提到 PhoenixConnection，非常轻量级，不需要池化，也就是说可以随便创建连接。关于这一点，不再过多介绍，感兴趣的读者可以自行谷歌。

/**
  * 
  * JDBC Connection implementation of Phoenix. Currently the following are
  * supported: - Statement - PreparedStatement The connection may only be used
  * with the following options: - ResultSet.TYPE_FORWARD_ONLY -
  * Connection.TRANSACTION_READ_COMMITTED
  */
public class PhoenixConnection implements Connection, MetaDataMutated, SQLCloseable {
      ...
}

/**
  * 
  * Interface for applying schema mutations to our client-side schema cache
  */
public interface MetaDataMutated {
       void addTable(PTable table, long resolvedTime) throws SQLException;
       void updateResolvedTimestamp(PTable table, long resolvedTimestamp) throws SQLException;
       void removeTable(PName tenantId, String tableName, String parentTableName, long tableTimeStamp) throws SQLException;
       void removeColumn(PName tenantId, String tableName, List<PColumn> columnsToRemove, long tableTimeStamp, long tableSeqNum, long resolvedTime) throws SQLException;
       void addFunction(PFunction function) throws SQLException;
       void removeFunction(PName tenantId, String function, long functionTimeStamp) throws SQLException;
       void addSchema(PSchema schema) throws SQLException;
       void removeSchema(PSchema schema, long schemaTimeStamp);
}

PhoenixConnection 只关注：

prepareStatement 和 createStatement

根据定义他们分表返回了：

PhoenixPreparedStatement
PhoenixStatement

分析到这里，还不涉及 SQL 的编译和执行，都是一些对象类型的转换和创建。为了简化分析，下面着重分析 PhoenixStatement 。

/**
  * 
  * JDBC Statement implementation of Phoenix.
  * Currently only the following methods are supported:
  * - {@link #executeQuery(String)}
  * - {@link #executeUpdate(String)}
  * - {@link #execute(String)}
  * - {@link #getResultSet()}
  * - {@link #getUpdateCount()}
  * - {@link #close()}
  * The Statement only supports the following options:
  * - ResultSet.FETCH_FORWARD
  * - ResultSet.TYPE_FORWARD_ONLY
  * - ResultSet.CLOSE_CURSORS_AT_COMMIT
  */
public class PhoenixStatement implements Statement, SQLCloseable {
      ...
}

注释讲的也很清楚，仅支持有限的方法，那么我们分析就有重点了。再加上此次分析的目的，executeQuery 方法就是我们的重中之重。

public ResultSet executeQuery(String sql) throws SQLException {
       if (LOGGER.isDebugEnabled()) {
           LOGGER.debug(LogUtil.addCustomAnnotations("Execute query: " + sql, connection));
       }

       CompilableStatement stmt = parseStatement(sql);
       if (stmt.getOperation().isMutation()) {
           throw new ExecuteQueryNotApplicableException(sql);
       }
       return executeQuery(stmt,createQueryLogger(stmt,sql));
}

executeQuery 逻辑比较简单，就是把SQL字符串编译成 CompilableStatement ，然后交给另一个 executeQuery 执行。

protected CompilableStatement parseStatement(String sql) throws SQLException {
          PhoenixStatementParser parser = null;
          try {
               parser = new PhoenixStatementParser(sql, new ExecutableNodeFactory());
          } catch (IOException e) {
            throw ServerUtil.parseServerException(e);
          }
          CompilableStatement statement = parser.parseStatement();
          return statement;
}

parseStatement：

创建：PhoenixStatementParser
编译：SQL 字符串

PhoenixStatementParser 其实是 SQLParser 的子类，其实就调用了 SQLParser 编译了 SQL 。

因为 PhoenixStatementParser 是在 executeQuery 中调用的，所以一定是一个 DML 语句。

/**
 * Parses the input as a SQL select or upsert statement.
 * @throws SQLException 
 */
public BindableStatement parseStatement() throws SQLException {
       try {
            BindableStatement statement = parser.statement();
            return statement;
       } catch (RecognitionException e) {
            throw PhoenixParserException.newException(e, parser.getTokenNames());
       } catch (UnsupportedOperationException e) {
            throw new SQLFeatureNotSupportedException(e);
       } catch (RuntimeException e) {
            if (e.getCause() instanceof SQLException) {
                throw (SQLException) e.getCause();
            }
            throw PhoenixParserException.newException(e, parser.getTokenNames());
       }
}

根据其逻辑，可以知道 DML 语句由语法文件中的 statement 定义。至此，SQL 字符串就编译完成，但还没有执行。

private PhoenixResultSet executeQuery(final CompilableStatement stmt,
        final boolean doRetryOnMetaNotFoundError, final QueryLogger queryLogger) throws SQLException {
       ...
}

根据代码逻辑，我们找到了上面这个方法最终编译、执行 CompilableStatement 。

继续分析，我们发现 CompilableStatement 被编译成了 QueryPlan ，QueryPlan 被

connection.getQueryServices().getOptimizer().optimize 优化，根据 QueryPlan 创建了 PhoenixResultSet 。

细心的读者可能要问，编译后的 SQL 在哪里转化成 HBase 的 Scan 呢？这就要知道 statement 是在哪里翻译的了。通过语法文件可以知道 statement 被 oneStatement 定义，oneStatement 语法图如下：

Phoenix解读 | Phoenix源码解读之SQL

很明显我们要去查看 select_node 的语法树：

single_select returns [SelectStatement ret]
@init{ contextStack.push(new ParseContext()); }
    :   SELECT (h=hintClause)? 
        (d=DISTINCT | ALL)? sel=select_list
        (FROM from=parseFrom)?
        (WHERE where=expression)?
        (GROUP BY group=group_by)?
        (HAVING having=expression)?
        { ParseContext context = contextStack.peek(); $ret = factory.select(from, h, d!=null, sel, where, group, having, null, null,null, getBindCount(), context.isAggregate(), context.hasSequences(), null, new HashMap<String,UDFParseNode>(udfParseNodes)); }
    ;
finally{ contextStack.pop(); }

unioned_selects returns [List<SelectStatement> ret]
@init{ret = new ArrayList<SelectStatement>();}
    :   s=single_select {ret.add(s);} (UNION ALL s=single_select {ret.add(s);})*
    ;

// Parse a full select expression structure.
select_node returns [SelectStatement ret]
@init{ contextStack.push(new ParseContext()); }
    :   u=unioned_selects
        (ORDER BY order=order_by)?
        (LIMIT l=limit)?
        (OFFSET o=offset (ROW | ROWS)?)?
        (FETCH (FIRST | NEXT) (l=limit)? (ROW | ROWS) ONLY)?
        { ParseContext context = contextStack.peek(); $ret = factory.select(u, order, l, o, getBindCount(), context.isAggregate()); }
    ;
finally{ contextStack.pop(); 
}

从上图可得知，最终 sql 语法分析后通过 factory.select 进行了语义转换，生成了 SelectStatement 对象。

但这里需要特别注意，fatory 的具体类型。在 parseStatement 函数中，传给 PhoenixStatementParser 的第二个参数是 ExecutableNodeFactory ，这就是 factry 的具体类型，所以 select 的具体实现由 ExecutableNodeFactory 完成。下面是 select 的代码。

public ExecutableSelectStatement select(TableNode from, HintNode hint, boolean isDistinct, List<AliasedNode> select, ParseNode where,
                List<ParseNode> groupBy, ParseNode having, List<OrderByNode> orderBy, LimitNode limit, OffsetNode offset, int bindCount, boolean isAggregate,
                boolean hasSequence, List<SelectStatement> selects, Map<String, UDFParseNode> udfParseNodes) {
       return new ExecutableSelectStatement(from, hint, isDistinct, select, where, groupBy == null ? Collections.<ParseNode>emptyList() : groupBy,
                    having, orderBy == null ? Collections.<OrderByNode>emptyList() : orderBy, limit, offset, bindCount, isAggregate, hasSequence, selects == null ? Collections.<SelectStatement>emptyList() : selects, udfParseNodes);
}

也就说编译后 SelectStatement 实际的类型是 ExecutableSelectStatement 。

compilePlan 也是由 ExecutableSelectStatement 实现。

public QueryPlan compilePlan(PhoenixStatement stmt, Sequence.ValueOp seqAction) throws SQLException {
       if(!getUdfParseNodes().isEmpty()) {
           stmt.throwIfUnallowedUserDefinedFunctions(getUdfParseNodes());
       }
       SelectStatement select = SubselectRewriter.flatten(this, stmt.getConnection());
       ColumnResolver resolver = FromCompiler.getResolverForQuery(select, stmt.getConnection());
       select = StatementNormalizer.normalize(select, resolver);
       SelectStatement transformedSelect = SubqueryRewriter.transform(select, resolver, stmt.getConnection());
       if (transformedSelect != select) {
           resolver = FromCompiler.getResolverForQuery(transformedSelect, stmt.getConnection());
           select = StatementNormalizer.normalize(transformedSelect, resolver);
       }

       QueryPlan plan = new QueryCompiler(stmt, select, resolver, Collections.emptyList(),
                    stmt.getConnection().getIteratorFactory(), new SequenceManager(stmt),
                    true, false, null)
                    .compile();
       plan.getContext().getSequenceManager().validateSequences(seqAction);
       return plan;
}

这段代码非常重要，特别是其中的几个函数：

SubselectRewriter.flatten；
FromCompiler.getResolverForQuery；
SubqueryRewriter.transform。

简单来说这段代码通过 connection 查询表结构，将表、字段等信息设置到 ExecutableSelectStatement 中。

当然为了简化，也不再深入研究其中的逻辑，只关注下面三段代码：

QueryPlan plan = new QueryCompiler(stmt, select, resolver, Collections.emptyList(),
                    stmt.getConnection().getIteratorFactory(), new SequenceManager(stmt),
                    true, false, null)
                    .compile();
plan.getContext().getSequenceManager().validateSequences(seqAction);
return plan;

这段代码表明，最终的 plan 是 QueryCompiler ，且调用了 compile 函数。

通过分析 compile 函数，我们得知，对于简单的 select 查询，最终创建了一个 ScanPlan 。当然了，还有很多其他类型的 plan ，这里选择一个最简单的，感兴趣的读者可以根据自己的实际情况分析其他各种场景。

QueryPlan plan = innerPlan;
QueryPlan dataPlan = dataPlans.get(tableRef);
if (plan == null) {
    ParallelIteratorFactory parallelIteratorFactory = asSubquery ? null : this.parallelIteratorFactory;
    plan = select.getFrom() == null
            ? new LiteralResultIterationPlan(context, select, tableRef, projector, limit, offset, orderBy,
               parallelIteratorFactory)
                    : (select.isAggregate() || select.isDistinct()
                            ? new AggregatePlan(context, select, tableRef, projector, limit, offset, orderBy,
                                    parallelIteratorFactory, groupBy, having, dataPlan)
                            : new ScanPlan(context, select, tableRef, projector, limit, offset, orderBy,
                                    parallelIteratorFactory, allowPageFilter, dataPlan));
}
SelectStatement planSelect = asSubquery ? select : this.select;
if (!subqueries.isEmpty()) {
     int count = subqueries.size();
     WhereClauseSubPlan[] subPlans = new WhereClauseSubPlan[count];
     int i = 0;
     for (SubqueryParseNode subqueryNode : subqueries) {
          SelectStatement stmt = subqueryNode.getSelectNode();
          subPlans[i++] = new WhereClauseSubPlan(compileSubquery(stmt, false), stmt, subqueryNode.expectSingleRow());
     }
     plan = HashJoinPlan.create(planSelect, plan, null, subPlans);
}

if (innerPlan != null) {
    if (LiteralExpression.isTrue(where)) {
        where = null; // we do not pass "true" as filter
    }
    plan = select.isAggregate() || select.isDistinct()
              ? new ClientAggregatePlan(context, planSelect, tableRef, projector, limit, offset, where, orderBy,
                            groupBy, having, plan)
                    : new ClientScanPlan(context, planSelect, tableRef, projector, limit, offset, where, orderBy, plan);

}

下面是 ScanPlan 的部分源码:

/**
  *
  * Query plan for a basic table scan
  */
public class ScanPlan extends BaseQueryPlan {
      private static final Logger LOGGER = LoggerFactory.getLogger(ScanPlan.class);
      private List<KeyRange> splits;
      private List<List<Scan>> scans;
      private boolean allowPageFilter;
      private boolean isSerial;
      private boolean isDataToScanWithinThreshold;
      private Long serialRowsEstimate;
      private Long serialBytesEstimate;
      private Long serialEstimateInfoTs;
      private OrderBy actualOutputOrderBy;
    ...
}

这里终于出现了我们最期待的 Scan 类型的对象。因为到这里，SQL 字符串就真正的变成了可实际执行的代码了。这个类的 newIterator 函数要特别关注一下，因为最终由它返回 ResultIterator ，其实就是对 Scan 的封装。Scan 具体是由 SerialIterators 或 ParallelIterators 生成的，细节也不再深入研究。再回到 executeQuery 函数内部：

ResultIterator resultIterator = plan.iterator();
if (LOGGER.isDebugEnabled()) {
    String explainPlan = QueryUtil.getExplainPlan(resultIterator);
    LOGGER.debug(LogUtil.addCustomAnnotations("Explain plan: " + explainPlan, connection));
}
StatementContext context = plan.getContext();
context.setQueryLogger(queryLogger);
if(queryLogger.isDebugEnabled()){
   queryLogger.log(QueryLogInfo.EXPLAIN_PLAN_I, 
   QueryUtil.getExplainPlan(resultIterator));
   queryLogger.log(QueryLogInfo.GLOBAL_SCAN_DETAILS_I, context.getScan()!=null?context.getScan().toString():null);
}
context.getOverallQueryMetrics().startQuery();
PhoenixResultSet rs = newResultSet(resultIterator, plan.getProjector(), plan.getContext());
resultSets.add(rs);
setLastQueryPlan(plan);
setLastResultSet(rs);
setLastUpdateCount(NO_UPDATE);
setLastUpdateOperation(stmt.getOperation());
// If transactional, this will move the read pointer forward
if (connection.getAutoCommit()) {
    connection.commit();
}
connection.incrementStatementExecutionCounter();
return rs;

可以看到这里生成了具体可执行、可访问的 ResultIterator ，然后被 PhoenixResultSet 封装返回。

PhoenixResultSet 代码就比较简单了，就是调用 next、get*** 处理返回的数据，不再深入分析。

到此为止，我们就大概了解 SQL 编译的过程，这里只是以最简单的查询为例，其他各种场景没有具体深入，感兴趣的读者可以自行研究。虽然简单，但并不妨碍我们对整个过程的了解和窥探，对于扩展全文检索还是有一些帮助的，至少最终的 iterator 就不能翻译成 Scan 了。

使用索引表

上一章简要分析了 SQL 的编译、执行过程，Phoenix 具体是如何走索引的还不清楚，下面会做简要分析。分析 Phoenix 如何使用索引表该从哪里入手呢？

使用过 Phoenix 索引的同学一定知道，在没有提示 ( hint ) 的情况下，Phoenix 默认是不会走索引的（除非只查询索引表包含的字段）。那么肯定要从解析 hint 来入手分析，至少这是个突破口。

single_select returns [SelectStatement ret]
@init{ contextStack.push(new ParseContext()); }
    :   SELECT (h=hintClause)? 
        (d=DISTINCT | ALL)? sel=select_list
        (FROM from=parseFrom)?
        (WHERE where=expression)?
        (GROUP BY group=group_by)?
        (HAVING having=expression)?
        { ParseContext context = contextStack.peek(); $ret = factory.select(from, h, d!=null, sel, where, group, having, null, null,null, getBindCount(), context.isAggregate(), context.hasSequences(), null, new HashMap<String,UDFParseNode>(udfParseNodes)); }
    ;
finally{ contextStack.pop(); }

通过上面的语法定义可以知道，hintClause 是 factory.select 的第二个参数。上一篇文章已经分析过这个 select ，此处就不再深入研究，但只需要找这里把 SQL 字符串中的 hint 编译成 HintNode 类。

/**
  * Node representing optimizer hints in SQL
  */
public class HintNode {
       ...
}

在 HintNode 类中有一个 Hint 枚举类，其中定义了 INDEX 。

/**
  * Hint of the form INDEX(<table_name> <index_name>...)
  * to suggest usage of the index if possible. The first
  * usable index in the list of indexes will be choosen.
  * Table and index names may be surrounded by double quotes
  * if they are case sensitive.
  */
INDEX,

关注 Hint 枚举类，主要是为了加快源码分析的步骤。其实如果读者看过上一篇文章，就一定还记得，在编译 QueryPlan 之后，还有一个 QueryOptimizer 的逻辑，如果猜的没错，就是在哪里生成了走索引的逻辑。还是老方法，全文检索 Hint.INDEX 的使用。

Phoenix解读 | Phoenix源码解读之SQL

很明显就是在 QueryOptimizer 中处理的。

QueryPlan plan = stmt.compilePlan(PhoenixStatement.this, Sequence.ValueOp.VALIDATE_SEQUENCE);
// Send mutations to hbase, so they are visible to subsequent reads.
// Use original plan for data table so that data and immutable indexes will be sent
// TODO: for joins, we need to iterate through all tables, but we need the original table,
// not the projected table, so plan.getContext().getResolver().getTables() won't work.
Iterator<TableRef> tableRefs = plan.getSourceRefs().iterator();
connection.getMutationState().sendUncommitted(tableRefs);
plan = connection.getQueryServices().getOptimizer().optimize(PhoenixStatement.this, plan);
// this will create its own trace internally, so we don't wrap this
// whole thing in tracing
ResultIterator resultIterator = plan.iterator();
if (LOGGER.isDebugEnabled()) {
    String explainPlan = QueryUtil.getExplainPlan(resultIterator);
    LOGGER.debug(LogUtil.addCustomAnnotations("Explain plan: " + explainPlan, connection));
}

QueryOptimizer.optimize 方法对 QueryPlan 进行了优化，那就要仔细分析一下了。通过阅读源码，来到下面这个方法。

private List<QueryPlan> getApplicablePlans(QueryPlan dataPlan, PhoenixStatement statement, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, boolean stopAtBestPlan) throws SQLException {
        if (!useIndexes) {
            return Collections.singletonList(dataPlan);
        }

        SelectStatement select = (SelectStatement) dataPlan.getStatement();
        if (!select.isUnion()
                && !select.isJoin()
                && select.getInnerSelectStatement() == null
                && (select.getWhere() == null || !select.getWhere().hasSubquery())) {
            return getApplicablePlansForSingleFlatQuery(dataPlan, statement, targetColumns, parallelIteratorFactory, stopAtBestPlan);
        }

        ColumnResolver resolver = FromCompiler.getResolverForQuery(select, statement.getConnection());
        Map<TableRef, QueryPlan> dataPlans = null;

        // Find the optimal index plan for each join tables in a join query or a
        // non-correlated sub-query, then rewrite the query with found index tables.
        if (select.isJoin()
                || (select.getWhere() != null && select.getWhere().hasSubquery())) {
            JoinCompiler.JoinTable join = JoinCompiler.compile(statement, select, resolver);
            Map<TableRef, TableRef> replacement = null;
            for (JoinCompiler.Table table : join.getTables()) {
                if (table.isSubselect())
                    continue;
                TableRef tableRef = table.getTableRef();
                SelectStatement stmt = table.getAsSubqueryForOptimization(tableRef.equals(dataPlan.getTableRef()));
                // Replace non-correlated sub-queries in WHERE clause with dummy values
                // so the filter conditions can be taken into account in optimization.
                if (stmt.getWhere() != null && stmt.getWhere().hasSubquery()) {
                    StatementContext context =
                            new StatementContext(statement, resolver, new Scan(), new SequenceManager(statement));;
                    ParseNode dummyWhere = GenSubqueryParamValuesRewriter.replaceWithDummyValues(stmt.getWhere(), context);
                    stmt = FACTORY.select(stmt, dummyWhere);
                }
                // TODO: It seems inefficient to be recompiling the statement again inside of this optimize call
                QueryPlan subDataPlan =
                        new QueryCompiler(
                                statement, stmt,
                                FromCompiler.getResolverForQuery(stmt, statement.getConnection()),
                                false, false, null)
                                .compile();
                QueryPlan subPlan = optimize(statement, subDataPlan);
                TableRef newTableRef = subPlan.getTableRef();
                if (!newTableRef.equals(tableRef)) {
                    if (replacement == null) {
                        replacement = new HashMap<TableRef, TableRef>();
                        dataPlans = new HashMap<TableRef, QueryPlan>();
                    }
                    replacement.put(tableRef, newTableRef);
                    dataPlans.put(newTableRef, subDataPlan);
                }
            }

            if (replacement != null) {
                select = rewriteQueryWithIndexReplacement(
                        statement.getConnection(), resolver, select, replacement);
                resolver = FromCompiler.getResolverForQuery(select, statement.getConnection());
            }
        }

        // Re-compile the plan with option "optimizeSubquery" turned on, so that enclosed
        // sub-queries can be optimized recursively.
        QueryCompiler compiler = new QueryCompiler(statement, select, resolver,
                targetColumns, parallelIteratorFactory, dataPlan.getContext().getSequenceManager(),
                true, true, dataPlans);
        return Collections.singletonList(compiler.compile());
}

简单起见，直接分析 getApplicablePlansForSingleFlatQuery，这个方法中有一个很重要的代码调用：IndexStatementRewriter.translate。

/**
 * Rewrite the parse node by translating all data table column references to   
 * references to the corresponding index column.
 * @param node the parse node
 * @param dataResolver the column resolver
 * @return new parse node or the same one if nothing was rewritten.
 * @throws SQLException 
 */
public static ParseNode translate(ParseNode node, ColumnResolver dataResolver) throws SQLException {
       return rewrite(node, new IndexStatementRewriter(dataResolver, null, false));
}

当然这里也只是提示这个函数的重要性，不对其逻辑做分析。因为还有一个非常重要的方法： getHintedQueryPlan。

private QueryPlan getHintedQueryPlan(PhoenixStatement statement, SelectStatement select, List<PTable> indexes, List<? extends PDatum> targetColumns, ParallelIteratorFactory parallelIteratorFactory, List<QueryPlan> plans) throws SQLException {
        QueryPlan dataPlan = plans.get(0);
        String indexHint = select.getHint().getHint(Hint.INDEX);
        if (indexHint == null) {
            return null;
        }
        int startIndex = 0;
        String alias = dataPlan.getTableRef().getTableAlias();
        String prefix = HintNode.PREFIX + (alias == null ? dataPlan.getTableRef().getTable().getName().getString() : alias) + HintNode.SEPARATOR;
        while (startIndex < indexHint.length()) {
            startIndex = indexHint.indexOf(prefix, startIndex);
            if (startIndex < 0) {
                return null;
            }
            startIndex += prefix.length();
            boolean done = false; // true when SUFFIX found
            while (startIndex < indexHint.length() && !done) {
                int endIndex;
                int endIndex1 = indexHint.indexOf(HintNode.SEPARATOR, startIndex);
                int endIndex2 = indexHint.indexOf(HintNode.SUFFIX, startIndex);
                if (endIndex1 < 0 && endIndex2 < 0) { // Missing SUFFIX shouldn't happen
                    endIndex = indexHint.length();
                } else if (endIndex1 < 0) {
                    done = true;
                    endIndex = endIndex2;
                } else if (endIndex2 < 0) {
                    endIndex = endIndex1;
                } else {
                    endIndex = Math.min(endIndex1, endIndex2);
                    done = endIndex2 == endIndex;
                }
                String indexName = indexHint.substring(startIndex, endIndex);
                int indexPos = getIndexPosition(indexes, indexName);
                if (indexPos >= 0) {
                    // Hinted index is applicable, so return it's index
                    PTable index = indexes.get(indexPos);
                    indexes.remove(indexPos);
                    QueryPlan plan = addPlan(statement, select, index, targetColumns, parallelIteratorFactory, dataPlan, true);
                    if (plan != null) {
                        return plan;
                    }
                }
                startIndex = endIndex + 1;
            }
        }
        return null;
}

这个方法逻辑很长，但都是在解析处理 hint 字符串，个人感觉这个地方还是可以优化的，毕竟如果在语法文件里面定义好 hint 的解析规则，这里就不用那么麻烦。下面是 hint 的语法定义，也是佐证了我们的猜想。

Phoenix解读 | Phoenix源码解读之SQL

上面代码最重要的就是调用 addPlan 方法，这个方法还是比较复杂的。

简要分析其逻辑来看，就是根据索引和当前的 select 查询语句，重新构造了 SelectStatement 对象，调用 ParseNodeRewriter.rewrite，对其进行重写，最终调用 QueryCompiler 编译成可执行的代码。

ParseNodeRewriter.rewrite 的第二个参数非常重要，具体的重写逻辑由其完成。如果我们新增全文检索，在重写时可能需要扩展这个类的功能。

/**
  * Used to replace parse nodes in a SelectStatement that match expressions that are present in an indexed with the
  * corresponding {@link ColumnParseNode}
  */
public class IndexExpressionParseNodeRewriter extends ParseNodeRewriter {
       ...
}

此类定义 indexedParseNodeToColumnParseNodeMap，

然后重写 leaveCompoundNode。

protected ParseNode leaveCompoundNode(CompoundParseNode node, List<ParseNode> children, CompoundNodeFactory factory) {
          return indexedParseNodeToColumnParseNodeMap.containsKey(node) ? indexedParseNodeToColumnParseNodeMap.get(node)
                : super.leaveCompoundNode(node, children, factory);
}

简单来说就是重写了语法树，具体怎重写的，比较复杂，这里不再研究。

今天就先分析到这里，虽然很多地方都没有研究清楚，但并不妨碍我们了解 Phoenix 使用索引的过程，至少我们知道是通过重写了 QueryPlan 来实现的。后面我们会通过动态调试的形式，来追踪 Phoenix 究竟是如何重写 QueryPlan 的，这样我们在扩展全文索引的时候，就知道该从何处下手了。

--未完待续

作者介绍

吴少杰，大龄程序猿一枚，爱好大数据生态的技术和框架，对数仓架构和实时计算比较熟悉，目前主要从事大数据开发和架构的工作

友情推荐

Phoenix解读 | Phoenix源码解读之SQL

大家工作学习遇到HBase技术问题，把问题发布到HBase技术社区论坛http://hbase.group，欢迎大家论坛上面提问留言讨论。想了解更多HBase技术关注HBase技术社区公众号(微信号:hbasegroup)，非常欢迎大家积极投稿。

Phoenix解读 | Phoenix源码解读之SQL

本群为HBase+Spark技术交流讨论，整合最优质的专家资源和技术资料会定期开展线下技术沙龙，专家技术直播，专家答疑活动

点击链接钉钉入群：https://dwz.cn/Fvqv066s或扫码进群

Phoenix解读 | Phoenix源码解读之SQL

本群为Cassandra技术交流讨论，整合最优质的专家资源和技术资料会定期开展线下技术沙龙，专家技术直播，专家答疑活动

Cassandra 社区钉钉大群：https://c.tb.cn/F3.ZRTY0o

Phoenix解读 | Phoenix源码解读之SQL

Cassandra 技术社区微信公众号：

Phoenix解读 | Phoenix源码解读之SQL

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Visual LISP程序设计

李学志 / 清华大学 / 2006-5 / 29.00元

本书系统地介绍了AutoCAD最新版本（2006）的Visual LISP程序设计技术。全书共分13章。前3章介绍AutoLISP语言的基础知识，第4章介绍Visual LISP的开发环境，第5～7章介绍程序的编辑、调试和设计的方法与技巧，第8章介绍如何定义新的AutoCAD命令及创建图层、线型、文字样式、剖面线、尺寸标注等各种AutoCAD对象，以及如何实现参数化图形设计的方法和技术，第9章介绍......一起来看看《Visual LISP程序设计》这本书的介绍吧!

码农工具

Phoenix解读 | Phoenix源码解读之SQL

使用索引表

--未完待续

友情推荐

Visual LISP程序设计

HTML 编码/解码

MD5 加密

HEX HSV 转换工具