and the Zoom video will be posted on the Blackboard afterwards.
Students can watch the video clips anytime they want.
No textbook will be used.
Instead award-winning, interactive, informative, and practical
(based on books, papers, online documents, and user manuals) and detailed and precise class instructions will be provided.
Collectively, the lecture notes and instructions are more like a small book, which supplies much more information than regular notes do and makes the subject studies much easier.
Students will not have problem learning the subjects or taking the exams after studying them and doing programmining exercises.
Week
|
Class |
Topic |
Due |
Where |
0 |
|
0. Computer Career and Data Research & Technologies |
|
|
0.1 A computer career |
|
|
0.2 Data research |
|
|
0.3 Data technologies |
|
|
1 |
08/28 08/30 |
1. Introduction to DATA 525 |
|
|
1.1 Course introduction |
|
|
1.2 Data life cycle |
|
|
1.3 Topics covered |
|
|
2 |
09/04 09/06 |
2. Programmining Exercise I |
|
|
2.1 Specifications |
|
|
2.2 Web page download |
|
|
2.3 Code sample |
|
|
09/04 |
Last day to add a course or drop without record — 100% refund
Last day to add audit or change to/from audit
Last day to receive a refund on a dropped class
Drops after the last day to add will appear on a transcript.
|
|
|
09/02 |
|
|
|
Holiday, Labor Day (Monday) — no classes
|
|
|
|
3 |
09/09 09/11 09/13 |
3. Essential Technologies for Exercise Construction |
|
|
3.1 Essential software and tools |
|
|
3.2 Using Linux |
|
|
3.3 Writing HTML scripts |
|
|
4 |
09/16 09/18 09/20 |
4. PHP (HyperText Preprocessor) |
|
|
4.1 LAMP |
|
|
4.2 PHP |
|
|
4.3 MySQL |
|
|
5 |
09/23 09/25 09/27 |
5. Web Search Services |
|
|
5.1 The World Wide Web |
|
|
5.2 Web page information |
|
|
5.3 Web search methods |
|
|
6 |
09/30 10/02 10/04 |
6. Information Retrieval (IR) |
|
|
6.1 Various IR methods |
|
|
6.2 Automatic indexing methods |
|
|
6.3 Data classification and clustering |
EX I |
|
7 |
10/07 10/11 |
7. The PageRank Algorithm |
|
|
7.1 Background |
|
|
7.2 The PageRank algorithm |
|
|
7.3 Computing PageRank scores |
|
|
10/09 |
|
|
|
Exam I (for both on-campus and on-line students; 6:30pm – 8:30pm, Wednesday)
|
|
|
|
8 |
10/14 10/16 10/18 |
8. Firebase Database |
|
|
8.1 Programmining Exercise II |
|
|
8.2 Introduction to Firebase |
|
|
8.3 Using Firebase |
|
|
9 |
10/21 10/23 10/25 |
9. TensorFlow |
|
|
9.1 TFJS operations |
|
|
9.2 TFJS models |
|
|
9.3 TFJS visor |
|
|
10 |
10/28 10/30 11/01 |
10. A TensorFlow.js Example |
|
|
10.1 Example introduction |
|
|
10.2 Example model |
|
|
10.3 Example training |
|
|
11 |
11/04 11/06 11/08 |
11. JavaScript |
|
|
11.1 JavaScript syntax |
|
|
11.2 JavaScript instructions |
|
|
11.3 JavaScript examples |
|
|
12 |
11/13 11/15 |
12. Decision Trees |
|
|
12.1 Background |
|
|
12.2 Measuring impurity |
|
|
12.3 Information gain |
|
|
11/15 |
Last day to change to or from S/U grading
Last day to change to or from audit grading
Last day to drop a full-term course or withdraw from school
|
|
|
11/11 |
|
|
|
Holiday, Veteran’s Day (Monday) — no classes
|
|
|
|
13 |
11/18 11/22 |
13. k-Nearest Neighbors (kNN) Algorithm |
|
|
13.1 Background |
|
|
13.2 kNN for prediction and smoothing |
|
|
13.3 Strengths and weaknesses |
|
|
11/20 |
|
|
|
Exam II (for both on-campus and on-line students; 6:30pm – 8:30pm, Wednesday)
|
|
|
|
14 |
11/25 |
14. Artificial Neural Networks (ANNs) |
|
|
14.1 Artificial intelligence |
|
|
14.2 Backpropagation |
|
|
14.3 Genann: a minimal ANN |
|
|
11/27 11/28 11/29 |
|
|
|
Holidays, Thanksgiving Break (WeThFr) — no classes
|
|
|
|
15 |
12/02 12/04 12/06 |
15. Data Processing and Mining |
|
|
15.1 Data science |
|
|
15.2 Data warehouse |
|
|
15.3 Data fusion |
|
|
16 |
12/09 12/11 |
16. Data Mining Concepts |
|
|
16.1 Introduction to data mining |
|
|
16.2 Data mining steps |
|
|
16.3 Data mining techniques |
EX II |
|
17 |
12/18 |
|
|
|
Final exam (for both on-campus and on-line students; 06:30pm – 08:30pm, Wednesday)
|
|
|
|
18 |
12/24 |
Grades posted before noon, Tuesday |
|
|
According to
US News,
Best Tech Jobs of 2024 are listed as follows:
- Software developer (median salary: $127,260)
- IT manager (not developer; median salary: $164,070)
- Information security analyst (not developer; median salary: $112,000)
- Data scientist (median salary: $103,500)
- Web developer (median salary: $78,580)
- Computer systems analyst (not developer; median salary: $102,240)
- Computer network architect (not developer; median salary: $126,900)
- Database administrator (including developing; median salary: $99,890)
- Computer support specialist (not developer; median salary: $57,890)
- Computer systems administrator (not developer; median salary: $90,520)
- Computer Programmer (median salary: $97,800)
Computer science is different from many other disciplines (like electrical engineering).
It is more like a professional school (such as culinary schools), which emphasizes practical works instead of subject studies because many IT companies want the new recruitees to start contributing immediately.
There are three kinds of computing personnel:
- Developers:
- Positions (plenty): Developers of front-end and back-end web pages, mobile apps, and all kinds of software
- Skills (more stable):
Programmining languages (such as C++ and Java), web programmining, mobile app development, data processing and mining including databases, and data structures & algorithms
- Practitioners:
- Positions (not many): Experienced personnel like data scientists, database or system administrators, security analysts, and network architects
(more applications & configuration and less development)
- Skills (based on the needs of companies):
Databases, data warehousing, data lake, Hadoop, MapReduce, Linux, SPSS, SAS, Cogno, Matla, Taleau, etc.
- Researchers:
- Industrial positions (few and based on the needs of corporations): High quality personnel required for the advanced areas like artificial intelligence, security, computer vision, autonomous driving, and speech recognition
- Academic positions/trends (few and changed according to the government policies):
❓ ⇐ artificial intelligence ⇐ big data ⇐ high-performance computing ⇐ security ⇐ (mobile) networks
Unless you have an impressive resume or a strong connection, practicing tens or hundreds of questions posted at the
LeetCode is a must in order to secure a job at corporations (like Google and Facebook).
Otherwise, your chance of answering the questions correctly is low because of their high difficulty and time constraint.
In addition, you need to create
LinkedIn pages to show your achievements, and may consider uploading your projects to the
GitHub to showcase them.
Remark I:
Terminologies and definitions will be discussed minimally in this course.
Instead, effective methods and practical works will be emphasized and enforced.
Remark II:
Unlike the disciplines such as databases or the World Wide Web, data engineering and mining (DEM) is one of the disciplines (like image processing or artificial intelligence) without coherent methods or algorithms.
Many methods (such as artificial neural networks or relevance feedback) are used by DEM and each method is usually not closely related to other methods (like decision trees or sequential pattern mining).
Remark III:
A wide variety of methods have been used by DEM, and the current methods are rather complicated. In order to show what the data engineering and mining (DEM) is in a semester, this course has to pick a small number of fundamental topics, instead of many advanced topics, to investigate. Students then use the training to revise the appropriate methods for the problems they encounter in the future.
Remark IV:
Take the following steps to conduct research:
- Identify a problem.
- Study related literature and methods.
- Create/adapt a method to solve/suit the problem.
- Figure out how to improve the method.
- Complete the implementation.
- Perform the testing to ensure the system is correct.
- Evaluate the system including comparisons.
- Publish the results.
|
|
|
Instructor’s qualification:
The instructor’s current research includes mobile computing and information retrieval.
He has applied various information retrieval methods (such as artificial neural networks, finite-state machines, and association-rule and sequential-pattern mining) to mobile applications and web searches.
The instructor has published more than 100 research publications and advised more than 50 graduate students.
Most of the research topics are related to (mobile) data engineering and mining.
University of North Dakota Course Description (DATA 525) —
This course studies theoretical and applied issues related to data engineering and mining.
Data engineering is to identify, investigate, and analyze the underlying principles in the design and effective use of information systems; and data mining is to discover patterns in large data sets and transform the patterns into a comprehensible structure for further applications.
The following topics are covered: data collection, data preparation, data indexing and storage, data processing and analysis, data classification and clustering, knowledge discovery, information retrieval, data visualization, data sharing, data applications, and some other special topics.
Data Science from Wikipedia —
Data science is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.
Data Engineering from IEEE Computer Society Data Engineering Bulletin —
The role of data in the design, development, mining and utilization of information systems:
- Databases and the World Wide Web,
- Ming of semistructured data, metadata and XML,
- Heterogeneous, distributed, parallel and mobile databases,
- Data warehousing and OLAP,
- Data, text and web mining,
- Optimization of query processing and database architectures,
- Indexing, access methods and data structures,
- Temporal, spatial, scientific, statistical, biological databases, and
- Security and integrity control.
|
|
|
Data Engineering from Data & Knowledge Engineering —
Data engineering is to identify, investigate and analyze the underlying principles in the design and effective use of database systems:
- Representation and manipulation of data,
- Architectures of database systems,
- Construction of databases,
- Applications, case studies, and mining issues, and
- Tools for specifying and developing databases using tools based on linguistics or human machine interface principles.
Data Mining from Wikipedia —
Data mining comprises all the disciplines related to managing data as a valuable resource:
- Data governance,
- Data architecture, analysis and design,
- Database mining,
- Data security mining,
- Data quality mining,
- Reference and master data mining,
- Data warehousing and business intelligence mining,
- Data, text and web mining,
- Optimization of query processing and database architectures,
- Indexing, access methods and data structures,
- Temporal, spatial, scientific, statistical, biological databases, and
- Security and integrity control.
|
|
|
Each student is required to build the following two systems:
- a focused web search engine based on a data life cycle and
- a data mining system using Firebase and TensorFlow.
An Internet-Enabled and Mobile Database Course Sequence —
This is part of an Internet/mobile-enabled database course sequence offered by me:
CSCI 260 .NET and World Wide Web Programmining
⇓
CSCI 457 Electronic and Mobile Commerce Systems
⇓
DATA 520 Databases
⇓
CSCI 513 Advanced Database Systems
⇓
CSCI 515 Data Engineering and Ming
The following platforms, software, and tools used in these courses greatly help students land a decent job:
- CSCI 260 (.NET and World Wide Web Programmining) to build database-driven websites by using
- Microsoft Access database,
- Microsoft ASP.NET,
- Microsoft C# or Visual Basic,
- Microsoft .NET, and
- Microsoft Visual Studio.
- CSCI 457 (Electronic and Mobile Commerce Systems) to build electronic and mobile commerce systems by using
- Android programmining,
- Android-server-database connection,
- (L) Linux operating system,
- (A) Apache web server,
- (M) MySQL database, and
- (P) PHP.
- DATA 520 (Databases) to build Internet/mobile-enabled database systems by using
- Android programmining,
- Android-server-database connection,
- JDBC (Java Database Connectivity),
- Oracle database, and
- Relational database design and SQL.
- CSCI 513 (Advanced Database Systems) to build Internet-enabled and embedded database systems by using
- Android programmining,
- Android SQLite embedded database,
- JDBC (Java Database Connectivity),
- Object-relational SQL and PL/SQL, and
- Oracle (an object-relational database).
- CSCI 515 (Data Engineering and Ming) to build Internet-enabled data-mining systems to discover knowledge from a large set of data by using
- Data mining and knowledge discovery,
- Internet-enabled Firebase database,
- Information retrieval, and
- Internet-enabled TensorFlow.