Spark SQL(6) OptimizedPlan

Spark SQL(6) OptimizedPlanSpark SQL(6) OptimizedPlan 在这一步spark sql主要应用一些规则,优化生成的Resolved Plan,这一步涉及到的有Optimizer。 之前介绍在sparkse…

	Spark SQL(6) OptimizedPlan[数据库教程]

Spark SQL(6) OptimizedPlan

在这一步spark sql主要应用一些规则,优化生成的Resolved Plan,这一步涉及到的有Optimizer。

之前介绍在sparksession实例化的是会实例化sessionState,进而确定QueryExecution、Analyzer,Optimizer也是在这一步确定的:

  protected def optimizer: Optimizer = {
    new SparkOptimizer(catalog, experimentalMethods) {
      override def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] =
        super.extendedOperatorOptimizationRules ++ customOperatorOptimizationRules
    }
  }

代码100分

  Optimizer也是RuleExecutor的子类,而SparkOptimizer是Optimizer子类,在analyzed步骤知道,其实主要规则就是RuleExecutor子类定义的batchs的规则

sparkOptimizer:

代码100分 override def batches: Seq[Batch] = (preOptimizationBatches ++ super.batches :+
    Batch("Optimize Metadata Only Query", Once, OptimizeMetadataOnlyQuery(catalog)) :+
    Batch("Extract Python UDF from Aggregate", Once, ExtractPythonUDFFromAggregate) :+
    Batch("Prune File Source Table Partitions", Once, PruneFileSourcePartitions) :+
    Batch("Push down operators to data source scan", Once, PushDownOperatorsToDataSource)) ++
    postHocOptimizationBatches :+
    Batch("User Provided Optimizers", fixedPoint, experimentalMethods.extraOptimizations: _*)

Optimizer:

def batches: Seq[Batch] = {
    val operatorOptimizationRuleSet =
      Seq(
        // Operator push down
        PushProjectionThroughUnion,
        ReorderJoin,
        EliminateOuterJoin,
        PushPredicateThroughJoin,
        PushDownPredicate,
        LimitPushDown,
        ColumnPruning,
        InferFiltersFromConstraints,
        // Operator combine
        CollapseRepartition,
        CollapseProject,
        CollapseWindow,
        CombineFilters,
        CombineLimits,
        CombineUnions,
        // Constant folding and strength reduction
        NullPropagation,
        ConstantPropagation,
        FoldablePropagation,
        OptimizeIn,
        ConstantFolding,
        ReorderAssociativeOperator,
        LikeSimplification,
        BooleanSimplification,
        SimplifyConditionals,
        RemoveDispensableExpressions,
        SimplifyBinaryComparison,
        PruneFilters,
        EliminateSorts,
        SimplifyCasts,
        SimplifyCaseConversionExpressions,
        RewriteCorrelatedScalarSubquery,
        EliminateSerialization,
        RemoveRedundantAliases,
        RemoveRedundantProject,
        SimplifyCreateStructOps,
        SimplifyCreateArrayOps,
        SimplifyCreateMapOps,
        CombineConcats) ++
        extendedOperatorOptimizationRules

    val operatorOptimizationBatch: Seq[Batch] = {
      val rulesWithoutInferFiltersFromConstraints =
        operatorOptimizationRuleSet.filterNot(_ == InferFiltersFromConstraints)
      Batch("Operator Optimization before Inferring Filters", fixedPoint,
        rulesWithoutInferFiltersFromConstraints: _*) ::
      Batch("Infer Filters", Once,
        InferFiltersFromConstraints) ::
      Batch("Operator Optimization after Inferring Filters", fixedPoint,
        rulesWithoutInferFiltersFromConstraints: _*) :: Nil
    }

    (Batch("Eliminate Distinct", Once, EliminateDistinct) ::
    // Technically some of the rules in Finish Analysis are not optimizer rules and belong more
    // in the analyzer, because they are needed for correctness (e.g. ComputeCurrentTime).
    // However, because we also use the analyzer to canonicalized queries (for view definition),
    // we do not eliminate subqueries or compute current time in the analyzer.
    Batch("Finish Analysis", Once,
      EliminateSubqueryAliases,
      EliminateView,
      ReplaceExpressions,
      ComputeCurrentTime,
      GetCurrentDatabase(sessionCatalog),
      RewriteDistinctAggregates,
      ReplaceDeduplicateWithAggregate) ::
    //////////////////////////////////////////////////////////////////////////////////////////
    // Optimizer rules start here
    //////////////////////////////////////////////////////////////////////////////////////////
    // - Do the first call of CombineUnions before starting the major Optimizer rules,
    //   since it can reduce the number of iteration and the other rules could add/move
    //   extra operators between two adjacent Union operators.
    // - Call CombineUnions again in Batch("Operator Optimizations"),
    //   since the other rules might make two separate Unions operators adjacent.
    Batch("Union", Once,
      CombineUnions) ::
    Batch("Pullup Correlated Expressions", Once,
      PullupCorrelatedPredicates) ::
    Batch("Subquery", Once,
      OptimizeSubqueries) ::
    Batch("Replace Operators", fixedPoint,
      ReplaceIntersectWithSemiJoin,
      ReplaceExceptWithFilter,
      ReplaceExceptWithAntiJoin,
      ReplaceDistinctWithAggregate) ::
    Batch("Aggregate", fixedPoint,
      RemoveLiteralFromGroupExpressions,
      RemoveRepetitionFromGroupExpressions) :: Nil ++
    operatorOptimizationBatch) :+
    Batch("Join Reorder", Once,
      CostBasedJoinReorder) :+
    Batch("Decimal Optimizations", fixedPoint,
      DecimalAggregates) :+
    Batch("Object Expressions Optimization", fixedPoint,
      EliminateMapObjects,
      CombineTypedFilters) :+
    Batch("LocalRelation", fixedPoint,
      ConvertToLocalRelation,
      PropagateEmptyRelation) :+
    // The following batch should be executed after batch "Join Reorder" and "LocalRelation".
    Batch("Check Cartesian Products", Once,
      CheckCartesianProducts) :+
    Batch("RewriteSubquery", Once,
      RewritePredicateSubquery,
      ColumnPruning,
      CollapseProject,
      RemoveRedundantProject)
  }

  如上这便是在优化这步的所有的规则和策略例如消除子查询别名,表达式替换、算子下推、常量折叠等优化规则,经过这一步之后,就进入物理计划阶段了。

Spark SQL(6) OptimizedPlan

原文地址:https://www.cnblogs.com/ldsggv/p/13380953.html

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
转载请注明出处: https://daima100.com/7069.html

(0)
上一篇 2023-03-27
下一篇 2023-03-27

相关推荐

  • MySQL学习笔记(23):SQL安全「建议收藏」

    MySQL学习笔记(23):SQL安全「建议收藏」本文更新于2019-06-29,使用MySQL 5.7,操作系统为Deepin 15.4。 应用可使用PrepareStatement + Bind-Variable来防止SQL注入。 已知的非法符号

    2023-03-25
    135
  • 【赵强老师】MongoDB插入文档[通俗易懂]

    【赵强老师】MongoDB插入文档[通俗易懂]MongoDB是非关系型数据库NoSQL的代表,作为一款可分布式存储的数据库,对文档的操作是MongoDB的重中之重。在本文中,我们将着重为大家介绍如何在MongoDB中插入文档。 MongoDB一共

    2023-02-17
    144
  • infinispan~介绍

    infinispan~介绍国内的infinispan的文章不多,所以基本都是从google和官方api上找的资料,对一些问题的调研确实花了一些经历,但最终还是解决了问题,心情也是更加愉悦! 介绍 infinispan是分布式的

    2023-04-29
    134
  • 数据连接_数据连接电脑

    数据连接_数据连接电脑Pymysql 安装 pipinstallPyMySQL #-*-coding:utf-8-*- importpymysql conn=pymysql.connect(host="192.1

    2022-12-28
    160
  • Python 缩进规则

    Python 缩进规则Python 是一门解释性语言,在语言设计方面采用了缩进的方式来指示代码块。Python 应该是最注重缩进风格的语言之一,没有之一。Python 代码的缩进不仅是语法的一部分,还能够让代码具有更好的可读性,这也是 Python 能够成为首选编程语言之一的原因之一。

    2024-06-14
    53
  • Python中的is 9.0操作符:用于检查对象是否为同一个内存地址

    Python中的is 9.0操作符:用于检查对象是否为同一个内存地址Python中的is操作符可以用于检查两个对象是否指向同一块内存空间。is操作符的作用是比较两个对象在内存中的地址是否相同,而不是比较它们的值是否相等。因此,is操作符比==操作符更为严格。

    2024-01-21
    94
  • Python三元运算符的用法详解

    Python三元运算符的用法详解Python三元运算符是一种简洁的if-else判断形式,可以在一行中进行判断,使代码更加简洁明了。本文将详细介绍Python三元运算符的使用方法。

    2024-08-26
    17
  • 用Python执行CMD命令

    用Python执行CMD命令在计算机领域中,我们经常需要通过命令行工具来完成一些任务。而在Python程序中,我们也可以通过执行CMD命令来完成某些操作或获取某些数据。本文将介绍如何在Python代码中执行CMD命令,以及一些常用的技巧。

    2024-08-08
    22

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注