![]() Non-cooperative sources Snapshot sources – provides only full copy of source Specific sources – each is different, e.g., legacy systems Logged sources – writes change log (DB log) Queryable sources – provides query interface, e.g., SQL.Dimensions must be updated before facts The relevant dimension rows for new facts must be in place Special key considerations if initial load must be performed again © Torben Bach Pedersen 2006 - DWML course 11 Types of Data Sources.Incremental update Move only changes since last load Done periodically (./month/week/day/hour/.) after DW start Less heavy - smaller data volumes.Initial load ETL for all data up till now Done when DW is started the first time Often problematic to get correct historical data Very heavy - large data volumes.Load of dimensions Small dimensions: replace Large dimensions: load only changes © Torben Bach Pedersen 2006 - DWML course 10 Building Fact Tables.Handling dimension changes Described in last lecture Find newest DW key for a given production key Table for mapping production keys to DW keys must be updated.Static dimension table Relatively easy? Assignment of keys: production keys to DW using table Combination of data sources: find common key? Check one-one and one-many relationships using sorting.Allows centralized backup/recovery Often too time consuming to initial load all data marts by failure Thus, backup/recovery facilities needed Better to do this centrally in DSA than in all data marts © Torben Bach Pedersen 2006 - DWML course 8 ETL Construction Process Plan 1) 2) 3) Construction of dimensions 4) 5) 6) Make high-level diagram of source-destination flow Test, choose and implement ETL tool Outline complex transformations, key generation and job sequence for every destination table Construct and test building static dimension Construct and test change mechanisms for one dimension Construct and test remaining dimension builds Construction of fact tables and automation 7) 8) 9) 10) Construct and test initial fact table build Construct and test incremental update Construct and test aggregate build (you do this later) Design, construct, and test ETL automation © Torben Bach Pedersen 2006 - DWML course 9 Building Dimensions.Finished dimensions copied from DSA to relevant marts. ![]() ![]() RDBMS or flat files? (DBMS have become better at this) Sequential operations (few) on large data volumes Performed by central ETL logic Easily restarted No need for locking, logging, etc.Transit storage for data underway in the ETL process Transformations/cleansing done here.source data changes © Torben Bach Pedersen 2006 - DWML course 4 ETL/DW Refreshment © Torben Bach Pedersen 2006 - DWML course 5 Refreshment Workflow © Torben Bach Pedersen 2006 - DWML course 6 ETL In The Architecture Query side ETL side Metadata Data sources Presentation servers -Warehouse Browsing -Access and Security Data marts with aggregate-only data -Query Management - Standard Reporting Conformed -Activity Monitor Data - Extract - Transform - Load Data Staging Area Query Services Warehouse Bus dimensions and facts Reporting Tools Desktop Data Access Tools Data mining Operationelle systemer Data marts with atomic data Data Service Element © Torben Bach Pedersen 2006 - DWML course 7 Data Staging Area Refreshment phase Keep the DW up-to-date wrt.Loading phase First load/population of the DW Based on all data in sources.Design phase Modeling, DB design, source selection,….© Torben Bach Pedersen 2006 - DWML course 3 DW Phases Load Load data into DW Build aggregates, etc.Transform Transform data to DW format Build keys, etc.The most time-consuming process in DW development Often, 80% of development time is spent on ETL.The most underestimated process in DW development.MS Integration Services A concrete ETL tool Demo Example ETL flow © Torben Bach Pedersen 2006 - DWML course 2 The ETL Process.General ETL issues The ETL/DW refreshment process Building dimensions Building fact tables Extract Transformations/cleansing Load.Extract, Transform, Load (ETL) Torben Bach Pedersen Aalborg University ETL Overview
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |