No textbook will be used.
Instead award-winning, interactive, informative, and practical
(based on books, papers, online documents, and user manuals) and detailed and precise class instructions will be provided.
Collectively, the lecture notes and instructions are more like a small book, which supplies much more information than regular notes do and makes the subject studies much easier.
Students will not have problem learning the subjects or taking the exams after studying them and doing programming exercises.
Week
|
Class |
Topic |
Due |
Where |
0 |
|
0. Computer Career and Data Research & Technologies |
|
|
0.1 A computer career |
|
|
0.2 Data research |
|
|
0.3 Data technologies |
|
|
1 |
01/14 01/16 |
1. Introduction to CSCI 515 |
|
|
1.1 Course introduction |
|
|
1.2 Data life cycle |
|
|
1.3 Topics covered |
|
|
2 |
01/21 01/23 |
2. Programming Exercise I |
|
|
2.1 Specifications |
|
|
2.2 Web page download |
|
|
2.3 Code sample |
|
|
01/23 |
Last day to add a course or drop without record
Last day to add audit or change to/from audit
Last day to receive a refund on a dropped class
Drops after the last day to add will appear on a transcript.
|
|
|
3 |
01/28 01/30 |
3. Essential Technologies for Exercise Construction |
|
|
3.1 Essential software and tools |
|
|
3.2 Using Linux |
|
|
3.3 Writing HTML scripts |
|
|
4 |
02/04 02/06 |
4. PHP (HyperText Preprocessor) |
|
|
4.1 LAMP |
|
|
4.2 PHP |
|
|
4.3 MySQL |
|
|
5 |
02/11 02/13 |
5. Web Search Services |
|
|
5.1 The World Wide Web |
|
|
5.2 Web page information |
|
|
5.3 Web search methods |
|
|
6 |
02/18 02/20 |
6. Information Retrieval (IR) |
|
|
6.1 Various IR methods |
|
|
6.2 Automatic indexing methods |
|
|
6.3 Data classification and clustering |
EX I |
|
7 |
02/27 |
7. The PageRank Algorithm |
|
|
7.1 Background |
|
|
7.2 The PageRank algorithm |
|
|
7.3 Computing PageRank scores |
|
|
02/25 |
|
|
|
Exam I (for both on-campus and on-line students; 6:30pm – 8:30pm, Tuesday)
|
|
|
|
8 |
03/04 03/06 |
8. Firebase Database |
|
|
8.1 Programming Exercise II |
|
|
8.2 Introduction to Firebase |
|
|
8.3 Using Firebase |
|
|
9 |
03/10 – 03/14 |
|
|
|
Spring Break — no classes
|
|
|
|
10 |
03/18 03/20 |
10. TensorFlow |
|
|
10.1 TFJS operations |
|
|
10.2 TFJS models |
|
|
10.3 TFJS visor |
|
|
11 |
03/25 03/27 |
11. A TensorFlow.js Example |
|
|
11.1 Example introduction |
|
|
11.2 Example model |
|
|
11.3 Example training |
|
|
12 |
04/01 04/03 |
12. JavaScript |
|
|
12.1 JavaScript syntax |
|
|
12.2 JavaScript instructions |
|
|
12.3 JavaScript examples |
|
|
13 |
04/08 04/10 |
13. Decision Trees |
|
|
13.1 Background |
|
|
13.2 Measuring impurity |
|
|
13.3 Information gain |
|
|
04/11 |
Last day to change to or from S/U grading
Last day to change to or from audit grading
Last day to drop a full-term course or withdraw from school
|
|
|
14 |
04/17 |
14. k-Nearest Neighbors (kNN) Algorithm |
|
|
14.1 Background |
|
|
14.2 kNN for prediction and smoothing |
|
|
14.3 Strengths and weaknesses |
|
|
04/15 |
|
|
|
Exam II (for both on-campus and on-line students; 6:30pm – 8:30pm, Tuesday)
|
|
|
|
15 |
04/22 04/24 |
15. Artificial Neural Networks (ANNs) |
|
|
15.1 Artificial intelligence |
|
|
15.2 Backpropagation |
|
|
15.3 Genann: a minimal ANN |
|
|
16 |
04/29 05/01 |
16. Data Processing and Management |
|
|
16.1 Data science |
|
|
16.2 Data warehouse |
|
|
16.3 Data fusion |
|
|
17 |
05/06 05/08 |
17. Data Mining Concepts |
|
|
17.1 Introduction to data mining |
|
|
17.2 Data mining steps |
|
|
17.3 Data mining techniques |
EX II |
|
18 |
05/13 |
|
|
|
Final exam (for both on-campus and on-line students; 06:30pm – 08:30pm, Tuesday)
|
|
|
|
19 |
05/20 |
Grades posted before noon, Tuesday |
|
|
According to
US News,
Best Tech Jobs of 2025 are listed as follows:
- Software developer (median salary: $127,260)
- IT manager (not developer; median salary: $164,070)
- Information security analyst (not developer; median salary: $112,000)
- Data scientist (median salary: $103,500)
- Web developer (median salary: $78,580)
- Computer systems analyst (not developer; median salary: $102,240)
- Computer network architect (not developer; median salary: $126,900)
- Database administrator (including developing; median salary: $99,890)
- Computer support specialist (not developer; median salary: $57,890)
- Computer system administrator (not developer; median salary: $90,520)
- Computer programmer (median salary: $97,800)
Computer science is different from many other disciplines (like electrical engineering).
It is more like a professional school (such as culinary schools), which emphasizes practical works instead of subject studies because many IT companies want the new recruitees to start contributing immediately.
There are three kinds of computing personnel:
- Developers:
- Positions (plenty): Developers of front-end and back-end web pages, mobile apps, and all kinds of software
- Skills (more stable):
Programming languages (such as C++ and Java), web programming, mobile app development, data processing and management including databases, and data structures & algorithms
- Practitioners:
- Positions (not many): Experienced personnel like data scientists, database or system administrators, security analysts, and network architects
(more applications & configuration and less development)
- Skills (based on the needs of companies):
Databases, data warehousing, data lake, Hadoop, MapReduce, Linux, SPSS, SAS, Cogno, Matla, Taleau, etc.
- Researchers:
- Industrial positions (few and based on the needs of corporations): High quality personnel required for the advanced areas like artificial intelligence, security, computer vision, autonomous driving, and speech recognition
- Academic positions/trends (few and changed according to the government policies):
❓ ⇐ artificial intelligence ⇐ big data ⇐ high-performance computing ⇐ security ⇐ (mobile) networks
Unless you have an impressive resume or a strong connection, practicing tens or hundreds of questions posted at the
LeetCode is a must in order to secure a job at corporations (like Google and Facebook).
Otherwise, your chance of answering the questions correctly is low because of their high difficulty and time constraint.
In addition, you need to create
LinkedIn pages to show your achievements, and may consider uploading your projects to the
GitHub to showcase them.
Remark I:
Terminologies and definitions will be discussed minimally in this course.
Instead, (i) effective methods and practical works will be emphasized and enforced, (ii) the trend of (mobile) data engineering and management will be discussed, and (ii) smartphone structures will be studied.
Remark II:
Unlike the disciplines such as databases or the World Wide Web, data engineering and management (DEM) is one of the disciplines (like image processing or artificial intelligence) without coherent methods or algorithms.
Many methods (such as artificial neural networks or relevance feedback) are used by DEM and each method is usually not closely related to other methods (like decision trees or sequential pattern mining).
Remark III:
In order to show what the data engineering and management (DEM) is in a semester, this course has to pick a small number of fundamental topics, instead of many topics, to investigate.
Students then use the training to choose appropriate methods for the problems they encounter in the future.
Remark IV:
Data engineering and management (and information retrieval) is a mature subject. A wide variety of methods have been applied to it, and the current methods are rather complicated because of its maturity.
In order to cover more topics, the methods introduced in this course are fundamental or primitive.
Students learn how the DEM methods work, and may try to enhance the methods or apply them in their programming exercises.
Remark V:
The DEM is a well-developed subject, and it is not easy to find a brand-new method.
On the other hand, artificial intelligence (AI), data mining (DM), machine learning (ML), or information retrieval (IR) has plenty of methods available to be used or adopted.
In order to take the advantages from both, the DEM borrows many methods from AI/DM/ML/IR.
However, the DEM is not the same as AI/DM/ML/IR because of the
problem of data processing.
That is a data research topic may consist of two parts: DEM and AI/DM/ML/IR, and you want to put an emphasis on the former instead of the latter because the DEM is more useful and practical.
Remark VI:
Take the following steps to conduct research:
- Identify a problem.
- Study related literature and methods.
- Create/adapt a method to solve/suit the problem.
- Figure out how to improve the method.
- Complete the implementation.
- Perform the testing to ensure the system is correct.
- Evaluate the system including comparisons.
- Publish the results.
|
|
|
Remark VII:
Online asynchronous is also provided for the distance students.
It is conducted fully through Internet instruction.
For details, check
UND Online & Distance Education or
DEDP (Distance Engineering Degree Program).
Besides,
https://und.zoom.us/j/2489867333 or
YuJa is used for hosting and sharing lecture videos, and
ProctorU may be used to monitor the exams.
Instructor’s qualification:
The instructor’s current research includes mobile computing and information retrieval.
He has applied various information retrieval methods (such as artificial neural networks, finite-state machines, and association-rule and sequential-pattern mining) to mobile applications and web searches.
The instructor has published more than 100 research publications and advised more than 50 graduate students.
Most of the research topics are related to (mobile) data engineering and management.
University of North Dakota Course Description (CSCI 515) —
This course studies theoretical and applied research issues related to data
engineering, management, and science. Topics will reflect state-of-the-art
and state-of-the-practice activities in the field. The course focuses on
well-defined theoretical results and empirical studies that have potential
impact on data acquisition, analysis, indexing, management, mining,
retrieval, and storage.
Data Science from Wikipedia —
Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.
Data Engineering from IEEE Computer Society Data Engineering Bulletin —
The role of data in the design, development, management and utilization of information systems:
- Databases and the World Wide Web,
- Management of semistructured data, metadata and XML,
- Heterogeneous, distributed, parallel and mobile databases,
- Data warehousing and OLAP,
- Data, text and web mining,
- Optimization of query processing and database architectures,
- Indexing, access methods and data structures,
- Temporal, spatial, scientific, statistical, biological databases, and
- Security and integrity control.
|
|
|
Data Engineering from Data & Knowledge Engineering —
Data engineering is to identify, investigate and analyze the underlying principles in the design and effective use of database systems:
- Representation and manipulation of data,
- Architectures of database systems,
- Construction of databases,
- Applications, case studies, and management issues, and
- Tools for specifying and developing databases using tools based on linguistics or human machine interface principles.
Data Management from Wikipedia —
Data management comprises all the disciplines related to managing data as a valuable resource:
- Data governance,
- Data architecture, analysis and design,
- Database management,
- Data security management,
- Data quality management,
- Reference and master data management,
- Data warehousing and business intelligence management,
- Data, text and web mining,
- Optimization of query processing and database architectures,
- Indexing, access methods and data structures,
- Temporal, spatial, scientific, statistical, biological databases, and
- Security and integrity control.
|
|
|
Each student is required to build the following two systems:
- a focused web search engine based on a data life cycle and
- a data mining system using Firebase and TensorFlow.
An Internet-Enabled and Mobile Database Course Sequence —
This is part of an Internet/mobile-enabled database course sequence offered by me:
CSCI 260 .NET and World Wide Web Programming
⇓
CSCI 457 Electronic and Mobile Commerce Systems
⇓
DATA 520 Databases
⇓
CSCI 513 Advanced Database Systems
⇓
CSCI 515 Data Engineering and Management
The following platforms, software, and tools used in these courses greatly help students land a decent job:
- CSCI 260 (.NET and World Wide Web Programming) to build database-driven websites by using
- Microsoft Access database,
- Microsoft ASP.NET,
- Microsoft C# or Visual Basic,
- Microsoft .NET, and
- Microsoft Visual Studio.
- CSCI 457 (Electronic and Mobile Commerce Systems) to build electronic and mobile commerce systems by using
- Android programming,
- Android-server-database connection,
- (L) Linux operating system,
- (A) Apache web server,
- (M) MySQL database, and
- (P) PHP.
- DATA 520 (Databases) to build Internet/mobile-enabled database systems by using
- Android programming,
- Android-server-database connection,
- JDBC (Java Database Connectivity),
- Oracle database, and
- Relational database design and SQL.
- CSCI 513 (Advanced Database Systems) to build Internet-enabled and embedded database systems by using
- Android programming,
- Android SQLite embedded database,
- JDBC (Java Database Connectivity),
- Object-relational SQL and PL/SQL, and
- Oracle (an object-relational database).
- CSCI 515 (Data Engineering and Management) to build Internet-enabled data-mining systems to discover knowledge from a large set of data by using
- Data mining and knowledge discovery,
- Internet-enabled Firebase database,
- Information retrieval, and
- Internet-enabled TensorFlow.