(Invited)
Start:
End:
Monday, 24.8. 10:00
Wednesday, 26.8. 14:15
This course provides an introduction to the core principles of big data systems and analytics, with a particular emphasis on handling extensive datasets in a distributed setting. The curriculum highlights distributed computing models such as Hadoop and HPCC Systems, covering aspects like block storage, file systems, Map-Reduce Jobs, and the CAP Theorem. Students will gain a deep understanding of batch processing, in-memory distributed processing, and stream processing. Furthermore, the course explores the architectures and functionalities of pivotal components within the Hadoop ecosystem, including Flume, Sqoop, HBase, Hive, and Pig, tailored for both structured and unstructured data analytics. Practical demonstrations will illustrate how these tools can be effectively utilized for comprehensive data analysis in a distributed environment. Participants will also delve into applying these concepts to real-world problems.
