This guide provides links to the "Data Topics" series of workshops by Ryan Womack, Data Librarian.

**New! Integrated guide to all Spring workshop materials - slides, videos, and code available at**

**https://bit.ly/NBL_Data_Science_Workshops_Spring_2024**

**Spring 2024 workshop information now available at:**

**NBL Workshop Calendar - https://libcal.rutgers.edu/calendar/nblworkshops**

**This integrated calendar contains information on all open workshops offered by the New Brunswick Libraries. Topics include Python, R, Digital Humanities, GIS, NVivo, and the Data Science Basics Workshop Series. **

**Tanya Khanna, Data Science Graduate Specialist, will be presenting the following workshops (Videos of the workshops will be added after the events):**

**Workshop materials (Jupyter notebooks and slides) are here: https://github.com/Tanya-Khanna/DataScienceWorkshop_2024_NBL/**

**1. Introduction to Python Programming [Video]:**

This workshop is designed for beginners with little to no experience in programming, aiming to provide a rapid yet comprehensive introduction to the world of Python, one of the most popular and versatile programming languages today. Learners will quickly grasp Python syntax, script execution, and fundamental constructs like variables, data types, and operators. They will also explore control structures like if-else statements, loops, and functions, gaining practical skills in data structures such as lists, tuples, sets, and dictionaries. Additionally, the workshop covers file handling and text processing.

**2. Mastering Data Analysis: Pandas and NumPy Essentials [Video]:**

This workshop is designed to equip learners with powerful tools for data analysis in Python. Participants will delve into the world of NumPy, exploring its efficient arrays and array operations, which form the backbone of numerical computing in Python. The workshop then shifts to Pandas, where learners will get hands-on experience with its fundamental data structures - Series and DataFrame. This comprehensive session is ideal for anyone looking to enhance their data analysis skills, offering the tools needed to unlock insights from data with efficiency and precision.

**3. Unveiling Data Stories: Python for Visualization and Exploration [Video]:**

This workshop is designed to guide participants through the process of revealing hidden stories in data using Python. It focuses on using Matplotlib and Seaborn, two prominent visualization tools, for effective exploratory data analysis (EDA). This workshop emphasizes the creation of engaging visual narratives, enabling participants to transform complex data insights into compelling and understandable visual formats.

**4. Mathematical Foundations for Data Science [Video]:**

This workshop offers a brief yet comprehensive overview of essential mathematics for data science. It covers foundational statistics and probability, crucial for model understanding, and basic hypothesis testing techniques. It also introduces linear algebra concepts like vectors and matrices, alongside fundamental calculus for derivatives and integrals.

**5. Introduction to Machine Learning: Supervised Learning [Video]:**

This workshop is tailored for beginners in machine learning. It focuses on supervised learning algorithms that are a cornerstone of machine learning, where the algorithm learns from labeled training data, helping to predict outcomes for unforeseen data. Classification and Regression will be introduced. Participants will learn about key algorithms like Linear Regression and Decision Trees, exploring how these methods enable machines to learn from and make predictions based on data.

**6. Introduction to Machine Learning: Unsupervised Learning [Video]:**

This workshop is designed to introduce the concepts of unsupervised learning, a branch of machine learning where algorithms infer patterns from unlabelled data. The course covers clustering methods like K-means and DBSCAN, used to identify inherent groupings in data. It also explores dimensionality reduction techniques such as PCA, which simplify complex data sets while preserving their key features. Additionally, the session introduces association rules, a method for finding interesting relationships within data sets. This workshop is ideal for those interested in learning how to extract insights from data without predetermined labels or categories.

**7. Introduction to Deep Learning [Video]:**

This workshop offers an introduction to the fundamentals of deep learning, a highly influential branch of artificial intelligence. This session focuses on the core concepts of neural networks, including feedforward neural networks, the simplest type of artificial neural network architecture. The course also covers convolutional neural networks (CNNs), essential for image and video recognition, and recurrent neural networks (RNNs), which are crucial for handling sequential data like text and speech.

**8. Deep Dive into Natural Language Processing [Video]:**

Are you eager to learn how to communicate with computer systems using Natural Language Processing (NLP) techniques, or to make machines understand human sentiments? Do you aspire to build intelligent applications like Siri, Alexa, or chatbots, even if you're starting from scratch? This workshop introduces Natural Language Processing (NLP), teaching you to preprocess text, analyze sentiments, model topics, and use language generation models. It's perfect for anyone eager to build applications that interact naturally with human language.

**9. Large Language Models and ChatGPT [Video]:**

This workshop offers a thorough exploration of cutting-edge language models, with a spotlight on ChatGPT. Attendees will delve into the design, training techniques, and practical uses of these models. Discussions on ethical usage and best practices will be a key part of the learning experience. By the workshop's end, participants will gain a deep understanding of large language models and how to effectively apply ChatGPT and similar technologies.

**2023 Python workshop materials from Harshith Thonupunoori are available at**

**https://github.com/NBLGraduateSpecialistProgram/Python2022_Thonupunoori**

Python Basics and Data Exploration ()This workshop will be an introduction to fundamental concepts such as variable assignment, data types, basic calculations, working with strings and lists, control structures (e.g. for-loops), functions.

Data Manipulation and Analysis with Python () In this workshop, we will dive into the world of arrays and data frames using the NumPy and pandas libraries. We'll cover data cleaning and pre-processing, joining and merging, group operations, and more.

Data Visualization with Python (Video) This workshop will give an introduction to data visualization with matplotlib and seaborn library, popular plotting libraries in Python.

**Machine Learning: Neural Networks (Video) **This workshop will give an introduction to machine learning methods with neural networks.

**Machine Learning: Deep Learning and Convolutional Neural Networks (Video) **This workshop will discuss Deep Learning and Convolutional Neural Networks as used in Machine Learning.

**Machine Learning: Decision Trees and Random Forests (Video) **This workshop will give an introduction to machine learning methods with decision trees and random forests.

Code files and PDF handouts by Data Science Graduate Specialist Robert Palmere are available at

**https://github.com/NBLGraduateSpecialistProgram/DataScience2022_Palmere**

**These cover the following topics:**

- Introduction to Python and Introduction to C (session 1_C)
- Data Manipulation and Analysis with Python and Basic Data Manipulation and Analysis with C/C++ (session 2_C)
- Data Visualization with Python and Data Visualization with C (session 3_C)
- additional practice problems
- Data Visualization with Python, continued and Object-oriented Programming with C++ (session 5_C)
- Cython and Python
- Data Mining in the Protein Data Bank
- Small Applications Development with Python
- Introduction to Machine Learning with Python
- Molecular Dynamics with Python

To download zipped files from GitHub repositories, click on the green "Clone or download" button on the upper right section of the repository page. Use Jupyter Notebook to open the .ipynb files in an interactive environment.

- Workshop 1: Python Basics and Data ExplorationGitHub repository for Workshop 1 files.
- Workshop 2: Data Manipulation and Analysis with PythonGitHub repository for Workshop 2 files.
- Workshop 3: Data Visualization and Machine Learning with PythonGitHub repository for Workshop 3 files.
- Workshop 4: Statistical Inference with PythonGitHub repository for Workshop 4 Jupyter Notebook
- Workshop 5: Data Science with Python, part 1GitHub repository for Workshop 5 Jupyter Notebook and Data
- Workshop 6: Data Science with Python, part 2GitHub repository for Workshop 6 Jupyter Notebook and Data

There are two separate series of Python workshops listed here, with different instructors and different content. Sly Zhong's series is more geared to beginners with the language (labeled "Beginners"), while Sanket Badhe's series will move at a faster pace (labeled "Accelerated").

**YouTube Playlist for all of Ziqiu (Sly) Zhong's Python workshop series.**

Python Basics and Data Exploration (Accelerated 1)

**Recording of Session**(Instructor, Sanket Badhe)

This workshop will be an accelerated introduction to fundamental concepts such as variable assignment, data types, basic calculations, working with strings and lists, control structures (e.g. for-loops), functions.

Python Basics and Data Exploration (Beginners 1)

**Recording of Session**(Instructor, Sly Zhong)

This workshop will be a more deliberate introduction to fundamental concepts such as variable assignment, data types, basic calculations, working with strings and lists, control structures (e.g. for-loops), functions.

Data Manipulation and Analysis with Python (Accelerated 2)

- Recording of Session (Instructor, Sanket Badhe)

In this workshop, we will dive into the world of arrays and data frames using the NumPy and pandas libraries. We'll cover data cleaning and pre-processing, joining and merging, group operations, and more. If you work with tabular data, this workshop is for you!

Data Manipulation and Analysis with Python (Beginners 2)

**Recording of Session**(Instructor, Sly Zhong)

In this workshop, we will dive into the world of arrays and data frames using the NumPy and pandas libraries. We'll cover data cleaning and pre-processing, joining and merging, group operations, and more.

Data Visualization with Python (Accelerated 3)

This workshop will give an introduction to data visualization with matplotlib and seaborn library, popular plotting libraries in Python.

- Recording of Session (Instructor, Sanket Badhe)

Data Visualization (Beginners 3)

**Recording of Session**(Instructor, Sly Zhong)

This workshop will continue with Numpy and Panda libraries. Data visualization with matplotlib, a popular plotting library in Python, will also be covered. Turn data into line, bar, scatter plots etc. Environmental Science and Economics data will be used and examples.scikit-learn library. We'll also learn how to do data visualization with matplotlib, a popular plotting library in Python.

Cryptocurrency API, Visualization, and Comparison project

Utilizing NumPy, pandas and matplotlib, this workshop will show how to make a program that can compare the price, Log Returns, SMA (Simple Moving Average) of Bitcoin and Ethereum, and predict which one is a better investment choice with Python. A real-time cryptocurrency interactive API will also be introduced in this workshop.

Statistical Hypothesis Tests - Basic Concepts and Implementation

This workshop delves into a wider variety of basic and most commonly used statistical tests including Null Hypothesis Testing, Critical Value, p-value, Z-test, T-test and Chi-Square Test etc. We will also introduce some examples about how to implement those tests with given database.

Intro to Tableau 1

The workshop will introduce the basics of using Tableau for Data Visualization. Design principles of quantitative and qualitive presenting and meaningful display methods.

Statistical Inference with Python

**Recording of Session** (Instructor, Sanket Badhe)

In this workshop, we will explore basic principles behind using data for estimation and for assessing theories. The workshop will focus on inference procedures, constructing confidence intervals, and hypothesis testing.

Supervised Learning - Regression

Instructor, Sanket Badhe

In this workshop, we will give an introduction to machine learning, supervised learning and unsupervised learning. Next, we will discuss different methods for train and test split. Finally, we will deepen our understanding of regression, specifically simple linear regression, multiple linear regression.

Supervised Learning - Classification 1

**Recording of Session** (Instructor, Sanket Badhe)

Instructor, Sanket Badhe

This workshop will first give an introduction about classification problems and then discuss classification algorithms such as K Nearest Neighbour, logistic regression. The latter half of the workshop will focus on classification metrics such as Confusion Matrix, Accuracy, Precision, Recall etc.

Exercise and Practice in Python (Beginner 4)

This workshop will go over some exercises and practice questions using Python for beginners. If you’re starting out with Python, this workshop is a good way to test your knowledge and learn how to make some small programs.

Intro to Tableau 2

More Tableau functions and data visualization options will be covered in this workshop.

Data Science with Python, part 2

This workshop focuses on advanced supervised learning methods for both classification and regression (Decision Tree, Random Forest, Support Vector Machine, Ensemble learning, Neural Network). We will apply all these techniques on a dataset and compare the results of each technique

Neural Networks

This workshop describe Neural Network techniques for data analysis.

Interaction with API in Economics

An API, or application programming interface, is a common tool for interacting with data on the web. This workshop will present how APIs are used in Finance (Equity and Cryptocurrency) and Economics (FRED) industry.Cryptocurrency) and Economics (FRED) industry.

Statistical Hypothesis Tests in Python/SAS/R

This workshop will introduce how to run most commonly used statistical tests in different programming languages including Python and R and show comparison of each of the languages.

**Spark Introduction**

This workshop will introduce you to pyspark, its features and components.

Data is all around us - in every industry and academic field, behind every online purchase recommendation and driving route calculation. Sometimes we have more data than we know what to do with. If solving data problems intrigues you (or if you just need some data for a class project...), check out the links below.

- 18 places to find data sets for data science projectsAs the title suggests, this informative blog post from Dataquest details 18 recommended data sources for tasks ranging from data visualization to processing of streaming data.
- KaggleKaggle is a dataset repository, data science competition host, tutorial provider, and more. It has an active community that discusses and contributes solutions to various data science problems.
- Data.govA home for U.S. government open data. Topics include climate, education, finance, and many more.

Three popular options for installing Python on your computer:

- Download Anaconda at https://www.anaconda.com/download/. Anaconda comes with many useful packages including different integrated development environments (Jupyter, Spyder, etc.), libraries for analytics and scientific computing (NumPy, SciPy, pandas, etc.), libraries for visualization (matplotlib, bokeh, etc.), and libraries for machine learning (such as scikit-learn). The installation should only take ~ 5 minutes, but it is a fairly large software package (~ 3 GB), so make sure you have enough disk space.
- Download WinPython from https://winpython.github.io/. WinPython is similar to Anaconda; it has the added benefit that it can be run off a USB stick if you are using a public computer and can't install new programs, but it may have fewer included packages than Anaconda [1].
- Download directly from https://www.python.org/downloads/. Use IDLE shell and editor. Comes with standard library, but will need to install libraries such as NumPy, SciPy, matplotlib, etc.
- Use Google colab through your scarletmail or personal gmail (https://colab.research.google.com/) , Google colab is Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.

[1] S. Byrnes, "Python for scientific computing: Where to start," *Steve Byrnes's Homepage*, Oct. 2017. [Online]. Available: http://sjbyrnes.com/python/. [Accessed 27 Apr. 2018].

Since Python is open source, there are abundant online resources to help learners find their way around the language. If you have a specific programming task you need help to achieve, a Google search is often the best way to start. Here is a list of resources you may find helpful if you're interested in a particular topic!

**General Python Learning**

- Python 3 DocumentationTutorials, standard library reference, installation instructions, and more from the Python Software Foundation.
- Automate the Boring StuffAn excellent introduction to Python for those who are new to programming. Each chapter comes with accompanying YouTube videos and practice problems.
- A Byte of PythonAnother Python tutorial for beginners.
- Python for InformaticsAnother tutorial with exercises for new programmers. Note that the examples may use syntax from Python 2.x rather than the current Python 3.x versions, but there are footnotes with corresponding Python 3 syntax.
- Python for EcologistsA tutorial covering everything from introductory Python concepts to pandas data analysis to matplotlib visualization. The entire lesson set is meant to be covered in a day, so this is a quick guide for those who wish to begin using Python for data analysis immediately.

**Visualizing Code Execution**

- Python TutorA step-by-step visualization tool to help you understand how Python executes any piece of code.

**Specific Topics in Python**

- Strings (Tutorials Point)A useful reference on string indexing, escape characters, string operators, string methods, and more!
- Lists (Tutorials Point)A useful reference on list indexing, slicing, list operators, list methods and built-in functions.
- Reading/Writing Files in PythonAn overview of how to read and write files in Python.

**NumPy and Pandas (Data Manipulation & Analysis)**

- Numpy (and Scipy) DocumentationUser guides and reference guides for each version of NumPy (and SciPy) for scientific computing in Python.
- Wes McKinney's
*Python for Data Analysis*eBookA practical textbook on using NumPy and pandas written by the main developer of pandas, Wes McKinney. The link leads to the eBook accessed through the Rutgers library system; if you do not have access, Google "Wes McKinney Python for Data Analysis" and it should come up. - Brandon Rhodes's PyCon 2015 Pandas TutorialAn excellent tutorial on using pandas for data manipulation and analysis. The GitHub page includes several exercises and an embedded YouTube video of Brandon conducting the tutorial. The video is ~ 3 hours long, but completely worth it.

**Data Visualization**

- Matplotlib DocumentationUser's guide to matplotlib, a popular and well-established plotting library in Python. Check out the pyplot tutorial for a good overview of the main plotting module.
- Nicolas Rougier's Matplotlib TutorialAn introduction to plotting with matplotlib with some nice examples (including examples of animated figures).
- Visualization with MatplotlibA chapter on visualization from Jake VanderPlas's
*Python Data Science Handbook*. This excerpt takes an in-depth look into matplotlib's functionalities and gives a brief overview of seaborn, a visualization library built on top of matplotlib. - Seaborn DocumentationReference, gallery, tutorial, and more for the seaborn visualization library, which allows for convenient creation of "attractive statistical graphics". If you're not sure where to start, click on the "Seaborn tutorial" link for an introduction to the API.

**Machine Learning**

- Scikit-learn DocumentationUser guide, tutorials, API reference, and more for the scikit-learn machine learning library. Check out the user guide for detailed descriptions of algorithms and how to implement them.
- SciPy 2017 Scikit-learn TutorialA broad overview of how scikit-learn can be used for machine learning topics including classification, regression, clustering, text feature extraction, cross-validation, evaluation metrics, and much more. Take a look at the files under the "notebooks" folder and/or watch the YouTube videos of Andreas Mueller and Alex Gramfort conducting the tutorial (links under the "Schedule" sub-heading).

This guide was originally created by Miranda So as the inaugural cohort of the Graduate Specialist Program. To follow Miranda's work, take a look at her GitHub page here.

Hang Miao served as Quantitative Data Graduate Specialist for the 2018-2019 Academic Year, and updated and expanded the workshop content. To follow Hang's work, see his Github page.

Further additions to the workshop content, including topics on statistical inference, machine learning, and HPC with Amarel, were added by Sanket Badhe (Github) and Ziqiu (Sly) Zhong (Github), Quantitative Data Graduate Specialists from Fall 2019 to Fall 2020.

From Spring 2021 to Spring 2022, Robert Palmere (Github) served as Data Science Graduate Specialist, contributing Python and C workshop content.

From Fall 2022 to Fall 2023, Harshith Thonupunoori served as Data Science Graduate Specialist, contributing Python workshop content. See his Github page.

Also see the general Github archive for prior NBL Graduate Specialist Material, and the Data Science Basics page for additional coverage of Python and other data tools.

- Last Updated: Aug 6, 2024 11:29 AM
- URL: https://libguides.rutgers.edu/data_topics
- Print Page

Rutgers University Libraries
169 College Ave

New Brunswick, NJ 08901-1163

New Brunswick, NJ 08901-1163

About Rutgers University Libraries

Contact the Libraries

© , Rutgers, The State University of New Jersey

Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers websites to accessibility@rutgers.edu or complete the Report Accessibility Barrier / Provide Feedback form.