Information Package / Course Catalogue
Big Data Analysis
Course Code: CSE424
Course Type: Area Elective
Couse Group: First Cycle (Bachelor's Degree)
Education Language: English
Work Placement: N/A
Theory: 2
Prt.: 2
Credit: 3
Lab: 0
ECTS: 6
Objectives of the Course

The recent explosion of social media and the computerization of every aspect of economic activity resulted in creation of large volumes of mostly unstructured data: web logs, videos, speech recordings, photographs, e-mails, Tweets, and similar. In a parallel development, computers keep getting ever more powerful and storage ever cheaper. Today, we have the ability to reliably and cheaply store huge volumes of data, efficiently analyze them, and extract business and socially relevant information. This course introduces you to several key IT technologies that you will be able to use to manipulate, store, and analyze big data. This course provides an in-depth coverage of special topics in big data from data generation, storage, management, transfer, to analytics, with focuses on the state-of-the-art technologies, tools, architectures, and systems that constitute big-data computing solutions in high-performance networks. Real-life bigdata applications in various domains (particularly in sciences) are introduced as use cases to illustrate the development, deployment, and testing of a wide spectrum of emerging big-data solutions. Also we will focus on data mining and machine learning algorithms for analyzing very large amounts of data or Big data.

Course Content

The course material will be drawn from textbooks as well as recent research literature. The following topics will be covered this year: Hadoop, Mapreduce, Association rules, Large scale supervised machine learning, Data streams, Clustering, NoSQL systems (Cassandra, Pig, Hive), and Applications including recommendation systems, Web and security.

Name of Lecturer(s)
Lec. Hüseyin ABACI
Learning Outcomes
1.By providing a balanced view of "theory" and "practice," the course should allow the student to understand, use, and build practical big data analytics an management systems. The course is intended to provide a basic understanding of the issues and problems involved in massive on-line repository systems, a knowledge of currently practical techniques for satisfying the needs of such a system, and an indication of the current research approaches that are likely to provide a basis for tomorrow's solutions.
2.learning of big data concepts, terminology, data analytics characteristics and types of Big Data such as 5V, structured unstructured, semi-structured and metadata.
3.comprehention of data analysis techniques and topics such as quantitative, qualitative data mining, Statistical Analysis, A/B testing, correlation, regression analysis.
4.having comprehensive knowledge of storage concepts such as clusters, distributed file systems, RDBMS, NoSQL, in-memory storage; Big Data processing concept such as parallel, distributed, batch data processing.
5.having a comprehensive knowledge of parallel processes and other design patterns for big data processing: Cloudera virtual machine. HDFS ( Hadoop Distributed Filesystem), YARN (Yet Another Resource Negotiator and Hue).
Recommended or Required Reading
1.Big Data Fundamentals: Concepts, Drivers & Techniques (1st ed.). Thomas Erl, Wajid Khattak, and Paul Buhler. Prentice Hall Press, Upper Saddle River, NJ, USA. 2016.
2.Big Data, Principles and Best Practices of Scalable Realtime Data Systems, Nathan Marz and James Warren, Manning Publications 2015.
3.Hadoop: The Definitive Guide, Tom White, O’Reilly, 2015.
Weekly Detailed Course Contents
Week 1 - Theoretical
Introduction to Big Data: Covers concepts, terminology, characteristics and types of Big Data such as 5V, structured unstructured, semi-structured and metadata. Business and research motivations and drivers for Big Data.
Week 2 - Theoretical
Storing and Analysing Big Data: Covers storage concepts such as clusters, distributed file systems, RDBMS, NoSQL, sharding, in-memory storage, also covers Big Data processing concept such as parallel, distributed, batch data processing and Hadoop.
Week 3 - Theoretical
Big Data Analysis Techniques: Covers analysis techniques and topics such as quantitative, qualitative data mining, statistical analysis, Machine Learning , semantic analysis and visual analysis of data.
Week 4 - Theoretical
MapReduce Framework and Hadoop: Covers parallel processes and other design patterns for big data processing. Cloudera virtual machine. HDFS - Hadoop Distributed Filesystem, YARN - Yet Another Resource Negotiator and Hue.
Week 5 - Theoretical
Mapreduce API and Basic Programming with Java: We will examine some of advanced details of Hadoop MapReduce Java API and programming.
Week 6 - Theoretical
Using Hive: is a “data warehouse” built top of HDFS and Hadoop. It allows SQL queries over data stored in HDFS.
Week 7 - Theoretical
Using Spark: A memory based evolution of MapReduce framework with considerable improvement in execution speed. Spark RDD-s.
Week 8 - Theoretical
Using Spark: A memory based evolution of MapReduce framework with considerable improvement in execution speed. Spark RDD-s.
Week 9 - Theoretical
Spark Streaming, Kafka and Cassandra: is becoming a standard stack for processing of fast data.
Week 10 - Theoretical
Spark MLLib, Machine Learning with Spark: We will review a few algorithms that can learn from and make predictions on data.
Week 11 - Theoretical
Visualizing Large Data Sets: We will introduce a Java Script API and techniques that enable more insightful use of graphs and charts to present the content and features of large data.
Week 12 - Theoretical
Visualizing Large Data Sets: We will introduce a Java Script API and techniques that enable more insightful use of graphs and charts to present the content and features of large data.
Week 13 - Theoretical
Advanced Topics in Big data and Experiments: High-performance Networking for Big Data Movement, Big Data Scientific Workflow Management and Optimization.
Week 14 - Theoretical
Advanced Topics in Big data and Experiments: High-performance Networking for Big Data Movement, Big Data Scientific Workflow Management and Optimization.
Assessment Methods and Criteria
Type of AssessmentCountPercent
Midterm Examination1%20
Final Examination1%34
Practice10%20
Quiz3%6
Assignment1%20
Workload Calculation
ActivitiesCountPreparationTimeTotal Work Load (hours)
Lecture - Theory140228
Lecture - Practice140228
Assignment50210
Term Project18715
Quiz45126
Midterm Examination116218
Final Examination120222
TOTAL WORKLOAD (hours)147
Contribution of Learning Outcomes to Programme Outcomes
PÇ-1
PÇ-2
PÇ-3
PÇ-4
PÇ-5
PÇ-6
PÇ-7
PÇ-8
PÇ-9
PÇ-10
PÇ-11
OÇ-1
5
5
4
4
4
4
4
OÇ-2
5
4
4
4
5
4
4
4
OÇ-3
5
5
4
4
4
5
4
OÇ-4
5
5
5
4
5
4
4
OÇ-5
5
4
5
4
4
4
5
4
4
5
4
Adnan Menderes University - Information Package / Course Catalogue
2026