Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Data Lake"

Sort by: Order: Results:

  • Heinonen, Jyrki (2020)
    Conventional Data warehouse main theme is ’single version of truth’ with either dimensional modeling option or normalized 3NF modeling. These both techniques have issues because on the way to data warehouse data is cleansed/transformed and data ends up changed, hence loosing information. Data Vault modeling - as response to these issues - is detail oriented and tracks history keeping the audit trail intact. This means we have ’single version of facts’ or ’all the data, all of the time’. Data Vault methodology and architecture can handle Big Data and NoSQL, which are also covered in this work on the Data Lake section. Data Lake tools have evolved strongly during the last decade and response to the ever expanding data amounts using distributed computing tactics. Data Lake can also ingest different types of structured, semi-structured and unstructured data. Data warehouse (and Data Lake) processing is moving from on-premises server rooms to the cloud data centers. Specifically Apache and Google have developed and inspired a lot of new tools, which can process data warehouse data on petabyte-scale. Now the challenge is that not only operational systems generate data to data warehouse but also huge amounts of machine-generated data has to be processed and analyzed on these practically infinitely scalable platforms. Data warehouse solution has to cover also machine-learning requirements. So the modernization of data warehouse is not over but still all these methodologies, architectures and tools are in use. The trick is to choose the right tool for the right job.