911 Calls – Exploratory Data Analysis

Skills: Python, Pandas, Seaborn Github Introduction For this project we’ll analyze the 911 call dataset from Kaggle. The data contains the following fields: lat : String variable, Latitude lng: String variable, Longitude desc: String variable, Description of the Emergency Call zip: String variable, Zipcode title: String variable, Title timeStamp: String variable, YYYY-MM-DD HH:MM:SS twp: String

Moneyball – Using EDA to Identify Replacement Players

Skills: R, Exploratory Data Analysis, gplot, dplyr Background During the 2001-02 offseason, the Oakland A’s team lost three key players to teams with larger revenues. The goal of this project is to look at player and salary data for those years, to find players of the same calibre (statistically) who have been undervalued by the

Prediction Boston Housing Prices

Skills: Python, Scikit-learn, Decision Tree Regression, Model Complexity Analysis Github Introduction In this project, we will evaluate the performance and predictive power of a model that has been trained and tested on data collected from homes in suburbs of Boston, Massachusetts. A model trained on this data that is seen as a good fit could

Stock Market Analysis for Tech Stocks

Skills: Python, Pandas, Seaborn, Financial Analysis Github In this project, we’ll analyse data from the stock market for some technology stocks. Again, we’ll use Pandas to extract and analyse the information, visualise it, and look at different ways to analyse the risk of a stock, based on its performance history. Here are the questions we’ll

3-Way Sentiment Analysis for Tweets

Skills: Python, Scikit-learn, NLP Github Overview In this project, we’ll build a 3-way polarity (positive, negative, neutral) classification system for tweets, without using NLTK’s in-built sentiment analysis engine. We’ll use a logistic regression classifier, bag-of-words features, and polarity lexicons (both in-built and external). We’ll also create our own pre-processing module to handle raw tweets. Data

Digit Sequence Recognition using Deep Learning

Skills: Python, Keras, Deep Learning, CNN, Computer Vision In this project, we will design and implement a deep learning model that learns to recognize sequences of digits. We will train the model using synthetic data generated by concatenating character images from MNIST. To produce a synthetic sequence of digits for testing, we will limit the

Cross Language Information Retrieval System

Skills: Python, NLP, IR, Machine Translation, Language Models Github Overview The aim of this project is to build a cross language information retrieval system (CLIR) which, given a query in German, will be capable of searching text documents written in English and displaying the results in German. We’re going to use machine translation, information retrieval

Creating Customer Segments using Unsupervised Machine Learning

Skills: Python, Scikit-learn, PCA, Clustering In this project, we will analyze a dataset containing data on various customers’ annual spending amounts (reported in monetary units) of diverse product categories for internal structure. One goal of this project is to best describe the variation in the different types of customers that a wholesale distributor interacts with.