911 Calls – Exploratory Data Analysis

911 Calls – Exploratory Data Analysis

Skills: Python, Pandas, Seaborn Github Introduction For this project we’ll analyze the 911 call dataset from Kaggle. The data contains the following fields: lat : String variable, Latitude lng: String variable, Longitude desc: String variable, Description of the Emergency Call zip: String variable, Zipcode title: String variable, Title timeStamp: String variable, YYYY-MM-DD HH:MM:SS twp: String…

3-Way Sentiment Analysis for Tweets

3-Way Sentiment Analysis for Tweets

Skills: Python, Scikit-learn, NLP Github Overview In this project, we’ll build a 3-way polarity (positive, negative, neutral) classification system for tweets, without using NLTK’s in-built sentiment analysis engine. We’ll use a logistic regression classifier, bag-of-words features, and polarity lexicons (both in-built and external). We’ll also create our own pre-processing module to handle raw tweets. Data…

Behavioral Risk Factor Surveillance System 2013 Exploratory Data Analysis

Behavioral Risk Factor Surveillance System 2013 Exploratory Data Analysis

Skills: Descriptive Statistics, R, ggplot, dplyr   In this project, we carry out exploratory analysis of the BRFSS-2013 data set by setting out research questions, and then exploring relationship between identified variables to answer those questions. To know more about BRFSS and the dataset, visit this link. The project was completed as a part of…

Cross Language Information Retrieval System

Cross Language Information Retrieval System

Skills: Python, NLP, IR, Machine Translation, Language Models Github Overview The aim of this project is to build a cross language information retrieval system (CLIR) which, given a query in German, will be capable of searching text documents written in English and displaying the results in German. We’re going to use machine translation, information retrieval…

Creating Customer Segments using Unsupervised Machine Learning

Creating Customer Segments using Unsupervised Machine Learning

Skills: Python, Scikit-learn, PCA, Clustering In this project, we will analyze a dataset containing data on various customers’ annual spending amounts (reported in monetary units) of diverse product categories for internal structure. One goal of this project is to best describe the variation in the different types of customers that a wholesale distributor interacts with.…