Skip to main content
Link to Libraries homepage
Link to Libraries homepage
Rutgers University Libraries

Graduate Specialist Program (New Brunswick Libraries): Python

Home page for the New Brunswick Libraries' Graduate Specialist Program

Python Workshop Materials

To download zipped files from GitHub repositories, click on the green "Clone or download" button on the upper right section of the repository page. Use Jupyter Notebook to open the .ipynb files in an interactive environment.

Data, Data Everywhere

Data is all around us - in every industry and academic field, behind every online purchase recommendation and driving route calculation. Sometimes we have more data than we know what to do with. If solving data problems intrigues you (or if you just need some data for a class project...), check out the links below.

Spring 2019 Workshop schedule

☞ RSVP for the Python workshops.

Workshops are offered in either Alexander Library or LSM (with identical content). Participants in LSM-based workshops must bring their own laptops.  At Alexander, you can either bring your own laptop, or use the desktops in the lab.

Python Basics and Data Exploration

  • Friday, February 1 – 12:30-2:30 pm, Alexander Library Room 413 (Instructor, Hang Miao)
  • Wednesday, February 13 – 12:00-2:00 pm, LSM Conference Room (Instructor, Aditya Vas)

This workshop will be an accelerated introduction to fundamental concepts such as variable assignment, data types, basic calculations, working with strings and lists, control structures (e.g. for-loops), functions. We will also start working with pandas, a popular data science library in Python, to explore a dataset on foodborne outbreaks reported to the CDC.

Data Manipulation and Analysis with Python

  • Friday, February 8 – 12:30-2:30 pm, Alexander Library Room 413 (Instructor, Hang Miao)
  • Monday, February 18 – 3:15-5:15 pm, LSM Conference Room (Instructor, Aditya Vas)

In this workshop, we will dive into the world of arrays and data frames using the NumPy and pandas libraries. We'll cover data cleaning and pre-processing, joining and merging, group operations, and more. If you work with tabular data, this workshop is for you!

Data Visualization and Machine Learning with Python

  • Friday, February 15 – 12:30-2:30 pm, Alexander Library Room 413 (Instructor, Hang Miao)
  • Monday, February 25 – 1:30-3:30 pm, LSM Conference Room (Instructor, Aditya Vas)

Interested in finding patterns and predicting unknown attribute values in your data? Join us for an overview of machine learning techniques implemented using the scikit-learn library. We'll also learn how to do data visualization with matplotlib, a popular plotting library in Python.

Data Scraping: Interaction with APIs with Python

  • Friday, February 22 – 12:30-2:30 pm, Alexander Library Room 413 (Instructor, Hang Miao)
  • Wednesday, March 6 – 12:00-2:00 pm, LSM Conference Room (Instructor, Aditya Vas)

This workshop is intended to show how to use Python to interact with third-party APIs for data collection. Different type of APIs with real applications will be introduced. Examples such as Rest API for FRED and Quandl will be discussed. A project regarding interacting with FRED API and merging with historical data will be demonstrated in detail.

Data Mining: Regression and Classification with Python

  • Friday, March 1 – 12:30-2:30 pm, Alexander Library Room 413 (Instructor, Hang Miao)
  • Monday, March 11 – 1:30-3:30 pm, LSM Conference Room (Instructor, Aditya Vas)

The traditional Least Square estimation, KNN face severe overfitting issues when the dataset has high-dimensional features. Modern data mining regression techniques such as lasso and classification techniques such as SVM gives a better estimation result in such a situation. The workshop intends to show how lasso and SVM works in Python. Compare the estimation result of Lasso with least square estimation, SVM with KNN in the high-dimensional setting.

CANCELLED! Machine Learning: Building a Neural Network from Scratch

  • Friday, April 12 – 12:30-2:30 pm, Alexander Library Room 413 (Instructor, Hang Miao)

This workshop is intended to help you establish a Neural Network mindset, and hone your intuitions about Deep Learning. We start by building a logistic regression as the baseline model to recognize cats. Then we develop a single hidden layer NN and extend to a deep NN by adding as many hidden layers as you want. Hopefully, you will see an improvement in accuracy relative to previous logistic regression.

Getting Started

Three popular options for installing Python on your computer:

  1. Download Anaconda at https://www.anaconda.com/download/. Anaconda comes with many useful packages including different integrated development environments (Jupyter, Spyder, etc.), libraries for analytics and scientific computing (NumPy, SciPy, pandas, etc.), libraries for visualization (matplotlib, bokeh, etc.), and libraries for machine learning (such as scikit-learn). The installation should only take ~ 5 minutes, but it is a fairly large software package (~ 3 GB), so make sure you have enough disk space.
  2. Download WinPython from https://winpython.github.io/. WinPython is similar to Anaconda; it has the added benefit that it can be run off a USB stick if you are using a public computer and can't install new programs, but it may have fewer included packages than Anaconda [1].
  3. Download directly from https://www.python.org/downloads/. Use IDLE shell and editor. Comes with standard library, but will need to install libraries such as NumPy, SciPy, matplotlib, etc. 

 

[1] S. Byrnes, "Python for scientific computing: Where to start," Steve Byrnes's Homepage, Oct. 2017. [Online]. Available: http://sjbyrnes.com/python/. [Accessed 27 Apr. 2018].

Learning Resources

Since Python is open source, there are abundant online resources to help learners find their way around the language. If you have a specific programming task you need help to achieve, a Google search is often the best way to start. Here is a list of resources you may find helpful if you're interested in a particular topic!

General Python Learning

Visualizing Code Execution

Specific Topics in Python

NumPy and Pandas (Data Manipulation & Analysis)

Data Visualization

Machine Learning

Quantitative Data Graduate Specialist

Credits

This guide was created by Miranda So as the inaugural cohort of the Graduate Specialist Program. To follow Miranda's work, take a look at her GitHub page here.

Rutgers, The State University of New Jersey, an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers web sites to: accessibility@rutgers.edu or complete the Report Accessibility Barrier / Provide Feedback Form.