ebook img

Special Study Guides for Data Warehouse, Data Analytics, Big Data for CS Senior Projects PDF

145 Pages·2016·12.86 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Special Study Guides for Data Warehouse, Data Analytics, Big Data for CS Senior Projects

I S NDEPENDENT TUDY Just how much can be learned in a semester under Dr. Chung’s guidance Nicholas J White April 24th, 2016 | nickwhite.us C ONTENTS Introduction 6 Why I requested an independent study (Beginning of Semester) 6 Why I am glad I requested an independent study (End of Semester) 8 Mission Statement and Goals 9 Course Mission Statement 9 Course Goals and Expected Outcomes 9 Introduction 10 Coursera Online Courses 11 Introduction to Coursera 11 My Coursework for This Semester 12 Introduction to Specializations 12 Specializations 12 Data Warehousing and Business Intelligence Specialization 13 Machine Learning Specialization 14 Data Mining Specialization 16 Data Science Specialization 18 Algorithms Specialization 20 CIS 611 Selected Course Materials 23 Introduction to CIS 611 Course Materials 23 eDx Online Courses 24 Introduction to eDx 24 Additional Coursework 25 Introduction to Additional Coursework 25 Coursera Online Courses 26 Data Warehousing and Business Intelligence Specialization 26 Course 1: Database Management Essentials 27 Course 2: Data Warehouse Concepts, Design, and Data Integration 39 Course 3: Relational Database Support for Data Warehouses 46 Course 4: Business Intelligence Concepts, Tools, and Applications 52 Project: Design and Build a Data Warehouse for Business Intelligence Implementation 58 Machine Learning Specialization 65 Course 1: Machine Learning Foundations: A Case Study Approach 65 Course 2: Machine Learning: Regression 73 Course 3: Machine Learning: Classification 83 Course 4: Machine Learning: Clustering & Retrieval 93 Course 5: Machine Learning: Recommender Systems & Dimensionality Reduction 94 Data Mining Specialization 94 Course 1: Data Visualization 94 Course 2: Text Retrieval and Search Engines 98 Course 3: Text Mining and Analytics 98 Course 4: Pattern Discovery in Data Mining 99 Course 5: Cluster Analysis in Data Mining 99 Data Science Specialization 100 Course 1: The Data Scientist’s Toolbox 100 Course 2: R Programming 102 Course 3: Getting and Cleaning Data 107 Course 4: Exploratory Data Analysis 109 Course 5: Reproducible Research 111 Course 6: Statistical Inference 114 Course 7: Regression Models 118 Course 8: Practical Machine Learning 122 Algorithms Specialization 125 Course 1: Algorithmic Toolbox 125 Course 2: Data Structures 130 Course 3: Algorithms on Graphs 135 Course 4: Algorithms on Strings 136 Couse 5: Advanced Algorithms and Complexity 136 CIS 611 Selected Course Materials 137 Database Normalization 137 Indexes 137 Functional Dependency 137 Storage and File System 137 eDx Online Courses 138 Course 1: Introduction to Data Storage and Management Technologies 138 Course 2: Introduction to Cloud Computing 140 Additional Coursework 144 SAC Application 144 Senior Design 144 Work Experience and Conclusion 145 C O O OURSE VERVIEW AND BJECTIVES I NTRODUCTION WHY I REQUESTED AN INDEPENDENT STUDY (BEGINNING OF SEMESTER) I’ll start by saying that this year is my senior year at Cleveland State University, thus, I do not have any motive or goal for writing this reflection of my independent study other than to give the college of engineering a glimpse into the experience of a student, firsthand. Hopefully after reading this, the reader will fully realize just how much of an impact a single professor can have on a student’s life, career, and love of engineering. I’d also like to say that I plan on graduating this semester, and so am not making an attempt to obtain a higher grade for this reflection. This is purely for the edification of the reader, and so the administration and Dr. Chung know just how much I sincerely appreciate all her hard work and effort this past year. Last semester (Fall 2015) I took Database Systems (CIS 430) with Dr. Sun Chung. To many undergraduate students, the topic of database systems is one which they do not immediately associate with glamor, fame, or what is referred to as the “sexy” part of computer science. To be entirely honest, I was one of these students. I was, at the time, working as a web application developer for Parker Hannifin, specifically in front-end technologies. My interest in database systems was, well, non-existent. This course started me down a path that I am certain is going to be my specialization and career. Enterprise database systems, data mining, big data, and all the other specialization under the umbrella of database systems are what I find to be most interesting, challenging, and downright cool, and this is all due to the foundation I received in Dr. Chung’s course. Dr. Chung brings to the table a multitude of attributes which make her not only a remarkable professor, but an incredible role model also. Her enthusiasm towards the topics she is professing, as well as towards her students is unwavering. While enrolled in CIS 430 I earned a position on Parker Hannifin’s Data and Business Intelligence team working on building out their data warehouse. The only reason I was able to transition to this team was the skills I learned from Dr. Chung in CIS 430, as well as the countless discussions we would have after class regarding how I should best prepare myself for the transition (it is important to note that Dr. Chung would often times stay 30, sometimes 40 minutes after class just to explain a topic which we were not even covering in the course). To me, this is truly what it means to be an engineering student. Until this time, I was unable to reach out to professors in this way, and so my learning and appetite for knowledge was curbed. With all this in mind, when I discovered that I could do an independent study, I may have actually jumped for joy. Upon discussion with Dr. Chung, the topics to be discovered weren’t ones that were arbitrarily picked, Dr. Chung (knowing her students and their aspirations) decided to put together a curriculum for me that would change my career forever, and for the better. WHY I AM GLAD I REQUESTED AN INDEPENDENT STUDY (END OF SEMESTER) Having completed the semester of independent study, I now have a lot more to talk about with regards to why I am glad I did indeed take this course. At the beginning of the semester, Dr. Chung and I had discussions about where I was going in my career, and what I wanted to do (something I have never talked to a professor about). With this in mind, she designed a custom curriculum for me. There are many reasons why one might perform better in an independent study setting than in a classroom setting. Some individuals are better suited for independent learning. Some students prefer to learn at times of the day where courses are not offered (at night, for example). Although these statements are true, the reason I performed better in an independent study setting than in any classroom I have been in at this university was the structure, guidance, and leadership put forth by Dr. Chung. As you will see, we covered a humongous amount of area this semester, and yes I used the word ‘humongous’ because we did cover MongoDB as well! In closing, I hope you enjoy my coverage of what I have learned this semester. If you enjoy reading it even 1% as much as I enjoyed actually doing it, we’ll both be very happy! M S G ISSION TATEMENT AND OALS COURSE MISSION STATEMENT The main driving forces behind the curriculum put forth by Dr. Chung were my new position on the data warehousing team at Parker Hannifin, and the interest and passion Dr. Chung and I share for the topic of database systems. The mission statement for this course was to accomplish the following goals: • Further my education in the field of data warehousing with a focus on practical work, as well as the theory and principles behind it • Expand my knowledge of the field to include the topics of o Big data and cloud computing o Machine learning and its role in the ecosystem o Relational concepts as they pertain to data warehouses • Determine which part of the field I am most interested in, and would like to pursue for my master’s degree COURSE GOALS AND EXPECTED OUTCOMES For this course, Dr. Chung combined several different learning techniques and mediums to form the curriculum. The goals of this curriculum were to cover a wide range of topics, but to do so in enough detail so that I can actually implement them. The expected outcomes were twofold: 1. Know the theory behind each topic, and truly understand why we need the technology, how we got to this point, where it is headed, and how it works on a highly technical level 2. Be able to implement each topic in a realistic setting. Knowing how something works is the first step, the next step for each topic was to implement it C O OURSEWORK VERVIEW I NTRODUCTION The actual coursework was chosen to cover all the objectives listed above, as well as to be interesting and practical in nature. The major breakdown of the coursework is as follows. Four different, and entirely separate sources of education were used for this independent study. Details of each type will be covered in their respective ‘introduction’ sections, but as a high-level overview, they fall into two categories. The first category is online, or e-learning. I took several courses from top universities, all online. This coursework was guided and supplemented by Dr. Chung’s knowledge when we met throughout the semester. The point of the online coursework was to allow me to keep learning every day, when we met only a few times a week. I was able to learn at, say, 9PM to 1AM every day, which allowed me to cover a lot of ground between our meetings. This was immensely helpful. The second category is learning with Dr. Chung, facilitated in person. The topics covered were more in-depth than the online courses, and will be covered in detail in the following sections.

Description:
Further my education in the field of data warehousing with a focus on .. tutorial in preparation for a graded assignment involving Pentaho Data Integration. Reading · Training Videos for Connecting to MDX and Excel Files and
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.