SYLLABUS

DATA 525 Data Engineering and Mining
(a practical and no-nonsense course)
School of Electrical Engineering and Computer Science, University of North Dakota
Fall 2023

Class times: 01:25pm – 02:15pm, MoWeFr
Classroom: Harrington Hall 218
Credit hours: 3
Prerequisite:
  • DATA511 Computing for Data Science I,
  • DATA512 Computing for Data Science II, and
  • DATA513 Mathematical Foundations for Data Science, or
  • Permission of the School of Electrical Engineering and Computer Science
Class pages: http://undcemcs01.und.edu/~wen.chen.hu/course/525/
 
Instructor: Wen-Chen Hu   (my teaching philosophy)
Email: wenchen@cs.und.edu
Office: Upson II 366K
Office hours: 02:30 pm – 04:30 pm, MoWeFr
Zoom ID: https://und.zoom.us/j/2489867333


Synchronous Class Delivery
The class lectures will be delivered synchronously via https://und.zoom.us/j/2489867333, and the Zoom video will be posted on the Blackboard afterwards.

Lecture Notes
No textbook will be used. Instead award-winning, detailed, and precise class instructions and interactive, informative, and practical lecture notes (based on books, papers, online documents, and user manuals) will be provided. Collectively, the lecture notes and instructions are more like a small book, which supplies much more information than regular notes do and makes the subject studies much easier. Students will not have problem learning the subjects or taking the exams after studying them and doing programming exercises.

Description
This course studies theoretical and applied issues related to data engineering and mining. Data engineering is to identify, investigate, and analyze the underlying principles in the design and effective use of information systems; and data mining is to discover patterns in large data sets and transform the patterns into a comprehensible structure for further applications. The following topics are covered:
  • Data crawling, collection, preparation, indexing, storage, searching, ranking, and mining,
  • Information retrieval,
  • Text analysis,
  • Database processing,
  • Database-driven web site construction,
  • Data processing and analysis,
  • Data classification and clustering,
  • Knowledge discovery,
  • Data visualization, sharing, and applications, and
  • Some other special topics.
Each student is required to build the systems based on the following two models independently:
  • A data life cycle —

  • A World Wide Web search engine —

Objectives
After taking this course, students are able to achieve the following goals, but not limited to: Evaluations
    Three programming exercises:
      1. Data crawling & collection  ——  12%
      2. Data indexing & Searching   ——  12%
      3. Data mining & analytics     ——  16%
    Two exams                        ——  20% each
    Final exam                       ——  20%

Tentative Schedule
    Week                    1  ——  Introduction
    Weeks           2,  3,  4  ——  Programming Exercise I construction
    Weeks 5, 6, 7,  8,  9, 10  ——  Information retrieval and data mining
    Weeks      11, 12, 13, 14  ——  Firebase and data analytics
    Weeks              15, 16  ——  Data mining and management concepts

Remark I
Terminologies and definitions will be discussed minimally in this course. Instead, (i) effective methods and practical works will be emphasized and enforced and (ii) the trend of data engineering and mining will be discussed.

Remark II
Unlike the disciplines such as databases or the World Wide Web, data engineering and mining (DEM) is one of the disciplines (like image processing or artificial intelligence) without coherent methods or algorithms. Many methods (such as artificial neural networks or relevance feedback) are used by DEM and each method is usually not closely related to other methods (like decision trees or sequential pattern mining).

Remark III
In order to show what the data engineering and mining (DEM) is in a semester, this course has to pick a small number of fundamental topics, instead of many topics, to investigate. Students then use the training to choose appropriate methods for the problems they encounter in the future.

Remark IV
Data engineering and mining (and information retrieval) is a mature subject. A wide variety of methods have been applied to it, and the current methods are rather complicated because of its maturity. In order to cover more topics, the methods introduced in this course are fundamental or primitive. Students learn how the DEM methods work, and may try to enhance the methods or apply them in their programming exercises.

Remark V
The DEM is a well-developed subject, and it is not easy to find a brand-new method. On the other hand, artificial intelligence (AI), data mining (DM), machine learning (ML), or information retrieval (IR) has plenty of methods available to be used or adopted. In order to take the advantages from both, the DEM borrows many methods from AI/DM/ML/IR. However, the DEM is not the same as AI/DM/ML/IR because of the problem of data processing. That is a data research topic may consist of two parts: DEM and AI/DM/ML/IR, and you want to put an emphasis on the former instead of the latter because the DEM is more useful and practical.

Instructor’s qualification
The instructor’s current research interests include (mobile) data research and applications such as (mobile) data security & mining, and mobile/smartphone/spatial/web computing. He has applied various information retrieval methods (such as artificial neural networks, finite-state machines, and association-rule and sequential-pattern mining) to mobile applications and web searches. The instructor has published more than 100 research publications and advised more than 50 graduate students. Most of the research topics are related to (mobile) data engineering, management, and mining.

Dishonesty
Under no circumstances will acts of academic dishonesty be tolerated. Any suspected incidents of dishonesty will be promptly referred to the Assistant Dean of Students. Refer to the Code of Student Life, Appendix B.2: Academic Dishonesty.

Disability
Students who need special accommodations for learning or who have special needs are invited to share these concerns or requests with the instructor as soon as possible.