Header menu link for other important links
X
An efficient heuristic for logical optimization of ETL workflows
Prudhvi Sreenivasa Kumar,
Published in Springer Verlag
2011
Volume: 84 LNBIP
   
Pages: 68 - 83
Abstract
An ETL process is used to extract data from various sources, transform it and load it into a Data Warehouse. In this paper, we analyse an ETL flow and observe that only some of the dependencies in an ETL flow are essential while others are basically represents the flow of data. For the linear flows, we exploit the underlying dependency graph and develop a greedy heuristic technique to determine a reordering that significantly improves the quality of the flow. Rather than adopting a state-space search approach, we use the cost functions and selectivities to determine the best option at each position in a right-to-left manner. To deal with complex flows, we identify activities that can be transferred between linear segments in it and position those activities appropriately. We then use the re-orderings of the linear segments to obtain a cost-optimal semantically equivalent flow for a given complex flow. Experimental evaluation has shown that by using the proposed techniques, ETL flows can be better optimized and with much less effort compared to existing methods. © 2011 Springer-Verlag Berlin Heidelberg.
About the journal
JournalData powered by TypesetLecture Notes in Business Information Processing
PublisherData powered by TypesetSpringer Verlag
ISSN18651348
Open AccessNo
Concepts (19)
  •  related image
    Cost functions
  •  related image
    Heuristic methods
  •  related image
    Metadata
  •  related image
    Optimization
  •  related image
    Rhenium compounds
  •  related image
    Complex flow
  •  related image
    Data integration
  •  related image
    DEPENDENCY GRAPHS
  •  related image
    ETL OPTIMIZATION
  •  related image
    ETL PROCESS
  •  related image
    Experimental evaluation
  •  related image
    FLOW OF DATA
  •  related image
    GREEDY HEURISTICS
  •  related image
    LINEAR FLOWS
  •  related image
    LINEAR SEGMENTS
  •  related image
    LOGICAL OPTIMIZATION
  •  related image
    STATE-SPACE
  •  related image
    WORK-FLOWS
  •  related image
    Data warehouses