DS-203: Programming for Data Science
Lecture Schedule Slot 5, Mon-Wed 9:30 am to 11:30 am. Venue Taught online Instructor: Manjesh Hanawal, Amit Sethi, Sunita Sarawagi, S. Sudarshan TA: Aishwarya Agarwal, Aditya Khanna, Anirudh Mittal, Himanshu Gupta, Kumar Miskin, Manoj Bhadu, Prerak Gandhi, Rishi Agarwal, Vaidehi Patil Email to reach all TAs and Instructors For questions of general interest to the class, use moodle / MS Teams.
Course description
Programming Basics (Python programming, R, Data Structures), Visualization/Plotting, Data Science Libraries (Pandas, PyPlot, matplotlib) Databases, GPUs/CUDA programming, Parallel/distributed computing for data science (Map/Reduce, Spark/Hadoop), working on the cloud (Amazon Web services, Google Cloud Platform, Azure, etc). The course will be programming heavy, with in-class and take-home programming exercises. A project can be optionally included.
Topics Covered
Introduction to basic probability Introduction to basic statistics Basic data understanding Performance of Python programs Exploratory data analysis and data visualization Linear and logistic regression Supervised machine learning as a Blackbox Deep Learning Introduction to software engineering Graphical User Interface (GUI) programming Introduction to databases Parallel query processing Cloud services
Eligibility
The course is open to all BTech students.
Prerequisites
None.
Credit/Audit Requirements
Approximate credit structurePerformace on assignments will form the majority of the course evaluation plan
Text References
- Principles and Techniques of Data Science, by Sam Lau, JoeyGonzalez, and Deb Nolan, 2019.
- Online tutorials on Python and R
- Learning Python, Mark Lutz, OReilly, 2005
- Python for data analysis, WesMckinney, O Reilly, 2013
- CUDA by Example: An Introduction to General-PurposeGPU Programming, JasonSanders, Nvidia, 2010
- NORMAN MATLOFF. ParallelComputing for Data Science: With Examples in R, C++, and CUDA. Boca Raton: CRC Press.
- Neural Networks and Deep Learning by Michael Nilson