The practicum provides an opportunity for students to address a real-world analytics problem faced by the sponsor corporation or agency.
The practicum courses are designed to instruct on two primary areas of content. One is to apply the core tools to a real-world project. The second is to provide useful exposure to the processes and professional development of the student in the role of analytics professional. Students have the opportunity to learn methodologies such as LEAN and Agile project management. Students are also exposed to conceptual mapping for data practitioners such as design thinking. They do this within the projects and in coursework.
Please Register at UNH.Analytics@unh.edu
2019- 2020 Practicum Projects
360 Intel is a consulting firm in Manchester, NH, that delivers mystery shopping programs, feedback surveys, field audits and brand reputation management to create a 360 view of your customer's experience.
Benevera Health is a technology-enabled services company focused on population health and person-centric healthcare.
Cognia works with schools, systems, and large education agencies to design reliable general and alternate assessment solutions.
Darling Consulting Group
The largest and most experienced Asset Liability Management (ALM) solutions provider in the U.S. DCG helps banks and credit unions manage balance sheets effectively.
FreshAir Sensor manufactures and sells devices to detect components of tobacco and marijuana smoke with their patented, polymer sensor technology (PolySens™).
Liberty Mutual Insurance Company
Liberty Mutual Insurance Company offer a wide range of insurance products and services, including personal automobile, homeowners, specialty lines, reinsurance, commercial multiple-peril, workers compensation, commercial automobile, general liability, surety, and commercial property.
Manchester-Boston Regional Airport
Strategically situated in the heart of New England, Manchester-Boston Regional Airport is a public airport in Manchester, NH, located less than fifty miles north of Boston, MA, and less than an hour’s drive from the region’s most popular ski areas, scenic seacoast beaches and peaceful lakefront resorts.
Martin's Point Healthcare
Martin’s Point is a health care organization leading the way to provide better care at lower costs in the communities we serve. As a not-for-profit organization, our primary mission is to create a healthier community through authentic relationships built on trust.
The Music Hall
From 1878, when The Music Hall first opened its doors as a Vaudeville theater, to its present incarnation as two robust arts venues, it has helped to position downtown Portsmouth, NH, as one of the most vibrant cultural destinations in New England.
RiverStone has transacted various types of deals – from insurance and reinsurance run-off portfolio purchases to outright acquisitions.
Team: Kultida Tongsodsang, Samantha Roberts, Jeffrey Eisenbeis, Callahan Gergen
Project Title: Extracting insightful customers experience by Sentiment Analysis
Background: 360 Intel is a customer experience evaluating service that help clients from various industries improve their business performance and customer experience by collecting data from mystery shoppers using extensive questionnaires. 360 Intel does not currently have a scientific method for their analysis. Therefore, 360 Intel is unable to clean up and organize the large amounts of data from questionnaires to provide more context for clients and utilize written responses from a mystery shopper survey. The current report is manual and requires excessive time to analyze.
Research Question: Can we develop an analytic process that provides better organization and the ability to extract meaningful insights from the mystery shopper data in order to improve the report’s context?
1. Improve the report to extract insight from written responses which is the response type that has never been analyzed before
2. Data cleaning and organizing to keep only meaningful data that helped save time and simplify analysis process
3. Create an interactive visualization and dashboard to enhance the report with ability to compare among clients or across industries
1. Extracting insight from written responses by using Sentiment Analysis.
2. Data cleaning by reducing large survey size and keep only meaningful data by utilizing Python as a tool
3. Data visualization using Tableau
Deliverables and Business Impact:
We delivered a dashboard and interactive visualizations that provide meaningful insights from written responses using Sentiment Analysis as a final product. The insightful customer experience dashboard is available for both individual clients and for comparison between clients. These dashboards can help 360 Intel enhance their own analysis and can improve their report for their clients.
Team: Kiana Brigham, Sindhu Veeramachaneni, Zachary Zalman
Project Title: Predicting Avoidable Emergency Department Visits
Background: Benevera Health is a population health management company that uses advanced analytics as well as a health care team to improve health outcomes of Harvard Pilgrim insured members. The analytics team utilizes Electronic Medical Records (EMR), insurance claims and member data to target at-risk members, while the health care team is responsible for outreach and assisting members with medical, mental/behavioral, financial, housing and transportation needs.
Research Question: Can the team develop a predictive model that identifies Benevera members with a high risk of experiencing an avoidable emergency department (ED) visit in the next 3-6 months? Can the team also determine variables that contribute to a high-risk classification?
Scope: To perform the analysis, the team utilized medical claims, demographic information and insurance plan information from Harvard Pilgrim insured members from 2016-2019. In addition, the team used the NYU Wagner Avoidable ED Algorithm to classify an emergency department insurance claim as avoidable or unavoidable.
Methods: To address the research questions, the team conducted exploratory data analysis that included K-Means Cluster Analysis. Many other techniques were included, such as feature engineering, logistic regression, random forest, and a neural network. The most successful model is the neural network, which was able to correctly classify 75% of emergency department visits as either avoidable or unavoidable.
Deliverables and Business Impact: The deliverable is a set of jupyter notebooks with Python code that included a method to extract subsets of the claims data as well as notebooks including the exploratory data analysis, k-means cluster analysis, machine learning and deep learning. Estimates of cost savings for each avoidable ED visit prevented was also given to the company.
Team: Mike Austria, Celine Hsieh, Jacob Mannix, Cassie Putsch
Project Title: An Interactive Approach to Data Exploration
Background: Cognia is a nonprofit organization dedicated to helping schools improve so that students and teachers have the opportunities they deserve. One of the mission statements for Cognia and its employees is to empower them to do their own research from the data that they have and create their own data driven decisions. As it stands right now, there are problems for Client Services with querying information from these surveys efficiently. They do not have quick access to the appropriate information. They must go through a long and inefficient process through the Research and Analytics Department in the hopes that what they receive includes all the information they requested. They are typically given only that information and nothing more (data exploration is not feasible).
Research Question: Is an interactive dashboard feasible for solving this efficiency problem? Can such a dashboard be made so that it is usable for Client Services and other members of the company to allow for easy data access and exploration?
Scope: The analysis used a sample of internal Cognia survey data from 2018-2019. The data was only from the United States for the purpose of simplicity, but Cognia does work with institutions from all around the world.
Methods: To address the key question, the team utilized Python and Tableau to test the feasibility of a dashboard solution. Python was used for exploratory data analysis as well as some data cleaning, and Tableau was used to build an interactive dashboard.
Deliverables and Business Impact: The deliverable is a fully functional interactive dashboard that will be used as a proof of concept for Cognia leaders. The goal is for this solution to be implemented in the future to make the process of data exploration much more efficient. The team is giving a presentation to Cognia executives to show the potential for this dashboard and what it can do for the company. The potential business impact for Client Services will be a great improvement to their tedious process of requesting data from the Research and Analytics Department. In addition to this, they will have the ability to explore the data themselves, beyond their initial query, making that Cognia mission statement of empowering employees to explore the data much more prevalent within the company.
Team: Abdalla Abdelmoaty, Eric McNeilly, Cristiano Costa, and Marc Arias
Project Title: Predicting a Financial Institutions’ non-maturity balances using publicly available call report data
Background: Non-maturity deposits have an important role since it is the main fund for banks and their balance is related to changes in the Federal fund rate. In this context, Darling asked us to predict non-maturity deposit balances for all bank branches in the US and understand how fluctuations in the FED fund rate could affect those predictions in the next twelve months.
Research Question: Is it possible to predict the non-maturity Deposit balance of bank branches for the next twelve months given a flat interest rate?
Scope: The analysis used publicly available FDIC call sheet reports and included financial institutions that were active between March 2015 and September 2019 and with total assets less than 10 billion dollars.
Methods: To address this question, the team utilized three main techniques including feature engineering, Time series clustering, and Long Short Term Memory networks.
Deliverables and Business Impact: The deliverable was a well-documented ensemble system implemented in Python and delivered in a PowerBI dashboard. This model will be used to market Darling Consulting Group’s Deposits360 analytics services and provide insight into the banking community, increasing potential clientele and enhancing existing customer experience.
Teams: Veena Ganamukhi, Eric Knop, Joshua Roberge, Mukta Singh, Kiana Brigham, Sindhu Veeramachaneni, Zachary Zalman
Project Title: Identifying Smoking Events: Using machine learning algorithms and feature engineering to differentiate smoking and non-smoking events.
Background: FreshAir Sensor manufactures, sells and services devices to detect components of tobacco and marijuana smoke with their patented, polymer sensor technology (PolySens™). FreshAir devices have Wi-Fi capabilities and provide alerts of smoking incidents in real-time, enabling clients to effectively monitor and deter smoking in prohibited areas. Devices are sold to hotels, apartments, housing authorities, schools and other multi-unit, professionally managed properties. FreshAir’s clients have a clear need (often a legal requirement) to provide a smoke-free environment for guests, residents, students and/or tenants.
Research Question: FreshAir would like us to create a machine learning system that can differentiate smoking and non-smoking events correctly.
Scope: In this project, we designed, implemented and tested a machine learning system to process anomalies detected from existing FreshAir devices and computed the likelihood that smoking has taken place. Our team was given access to timeseries data from 100,000+ labelled anomalies and completed the following tasks:
Data preparation and pre-processing
Feature engineering from raw sensor data
Model selection and training based on features
Methods: We ensembled 3 different deep learning networks and used that model to predict the events.
Deliverables and Business Impact: The model is 90% accurate, it only misses 25 smoking events per every 10,900 and it reduces the events to be manually reviewed by 84%.
Team: Abdalla Abdelmoaty, Eric McNeilly, Cristiano Costa, and Marc Arias
Project Title: Identifying anomalies in the Safeco AutoQuote Mainframe system
Background: Liberty Mutual’s Information Technology sector is the backbone and engine that makes the insurance company runs. Liberty processes billions of transactions a month and, occasionally, some transactions individually take a longer than usual time to run. These transactions consume up valuable and expensive resources in the Mainframe system. Liberty Mutual asked us to help identify these transactions that take excessively long to execute and to better understand the attributes that cause these transactions to be labeled as anomalies.
Research Question: Is it possible to identify anomaly transactions and understand what makes them unique?
Scope: Liberty Mutual provided data from Safeco AutoQuote representing around 86 million tasks from 9 computational parameters aggregated by day and hour that occurred between 9am until 2pm between December 2019 to March 2020.
Methods: To address this question, the team utilized a number of techniques including feature engineering, Dimensionality Reduction, Clustering and Ensemble Anomaly Models.
Deliverables and Business Impact: The deliverable was a well-documented ensemble system implemented in Python and delivered in a PowerBI dashboard. This model will be used to help the Mainframe operations team to identify anomalies in a more efficient way, increasing productivity, saving time and bringing better insights.
Team: Mike Austria, Celine Hsieh, Jacob Mannix, Cassie Putsch
Project Title: Developing a Fully Allocated Cost of Travel Estimator
Background: Manchester-Boston Regional Airport is a small airport located in Manchester, NH. It aims to serve as many passengers as possible in the Southern NH and Northern MA areas. In the recent years, the profit of the airport has declined due to higher fixed costs and operational costs than the nearby Boston-Logan Airport and a loss of passengers to Boston-Logan.
Research Question: Can we find the value that business and leisure passengers place on their time by looking at different modes of transportation to the airport? Can we develop a way to gain insights into how people value their time when traveling to an airport and use this insight in order to aim marketing efforts effectively?
Scope: The analysis involved open source data scraped from Uber and MapQuest as well as survey results generated by the team.
Methods: While it was discovered that finding an exact answer to this question was not feasible given the available resources, the team used Python to scrape Uber and MapQuest data, Tableau to visualize this collected data and display the differences in uber fares to Manchester and Boston, and Qualtrics to run a pilot survey for a small amount of data collection. This collected data was then used to create a simulation with Python that would estimate the cost of travel to Manchester and Boston from different zip codes, given different modes of transportation.
Deliverables and Business Impact: The final deliverable included the results of the pilot survey that offer additional insights, a report outlining the necessary parameters to input into the cost of travel estimator so that it can be implemented on the airport website in the form of a chat bot, and the Tableau visualizations outlining Uber fares. These deliverables will hopefully be able to help Manchester-Boston Regional Airport market the benefits of choosing this airport as well as serve as tools for educating potential passengers about the costs incurred while getting to an airport, showing that going to Manchester may actually be less expensive than going to Boston.
Team: David Ramsay, Jordan Myerowitz, Sally Akuffo, Erik Duisberg
Project Title: Capacity Utilization Analysis
Background: With about 100 medical providers offering medical services from seven locations in Maine and New Hampshire, one of MPHC's most pressing concerns is the utilization of its Provider assets. But measurement of intelligent, caring and sophisticated people is complicated and multidimensional.
Research Question: Can UNH help Martin's Point increase its capacity utilization and better balance its load of patients across its Providers?
Scope: The analysis accessed de-identified and encrypted appointment scheduling, appointment type, claims, provider availability, provider contract and provider booked hours tables.
Methods: Several iterations of the data were analyzed, with the final dataset comprising approximately 8 million records; EDA was accomplished in Python with flat files exported to excel for additional analysis and to Tableau for creation of a dashboard.
Deliverables and Business Impact: A flexible dashboard that measures capacity utilization and other related metrics, by Provider, by Provider Type, by Provider Category and Facility over monthly and yearly timeframes. After submitting initial dashboard and analysis for feedback, the team also delivered a benchmarking study of potential capacity utilization improvements that could be made by bringing below-average individual provider productivity levels up to the mean level (simple average). Potential productivity improvement was estimated to be 1 to 4 percent of total Provider booked hours, or approximately 1000 to 4000 Provider hours per year.
Team: Kultida Tongsodsang, Samantha Roberts, Jeffrey Eisenbeis, Callahan Gergen
Project Title: Predicting the Profile of a Successful Artist Performance at The Music Hall
Background: The Music hall is a performance venue in Portsmouth, NH. They currently have two locations, but the one that we are focusing on is the 900 fixed-seat theater. The Music Hall has difficulties identifying artists within their range that will be successful on a regular basis, and currently have no scientific method behind their artist scouting process. Music Hall’s ability to collect data is limited, along with their ability to use any data that they have collected. They currently only use data collected by themselves and have not attempted to look into alternative data sources that could be combined with their own.
Research Question: Can we improve The Music Hall’s artist selection process, and improve their decision-making ability by recommending artists that will perform and be successful at their 900-seat theater?
- Researched and acquired additional useful data from Social Blade, Chartmetric, and Google Demand.
- Formatted and imputed data in a replicable manner using Python, including determining a relevant success metric.
- Input data to train a Random Forest model, which was then applied to a set of test artists who had not performed at The Music Hall, to generate a prediction of the level of success a given artist would have.
- Downloading and inputting data from Social Blade, Chartmetric, and Google Demand.
- Data cleaning and imputing to create a manageable and insightful dataset using Python.
- Random Forest model and predictions generated in Python.
Deliverables and Business Impact: We have delivered a replicable data cleaning and modeling process using data gathered by the Music hall, with additional data from reputable sources. This process returns predictions of a leveled success metric, which can then be used as an additional source of insight when deciding whether to bring an artist to The Music Hall’s theater venue.
Team: David Ramsay, Jordan Myerowitz, Sally Akuffo, Erik Duisberg
Project Title: Regional Clustering Analysis
Background: EPHT has the responsibility for tracking and maintaining both Social Determinants of Health ("SDoH") and Health Outcomes data throughout the State of New Hampshire in order to better serve the state's citizens and address environmental public health concerns. EPHT's Community Health Outlook reports were developed from scratch in R and were geared toward providing consistent packages of information to specific communities in NH. With data available at this special resolution, EPHT sees Data Science – and leveraging interrelationships of data and statistics across Communities, Regions and Counties – as a next logical step to help inform their decision-making process with the information and data already available.
Research Question: Can UNH reconcile data across geographies, in a way that introduces new data science techniques to help facilitate health interventions now and in the future?
Scope: To evaluate spatial patterns of Health Outcomes as they relate to SDoH using data available via the Community and Region Health Outlooks and supplemental materials available on the NH WISDOM Data Portal. No confidential information was shared.
Methods: Our study used Exploratory Data Analysis ("EDA"), Principal Components ("PCA") and k-Means Clustering in Python to examine the SDoH and Outlooks scoped above, aggregate statistics for relevant geographies, reduce dimensionality of the dataset, explore clustering for various options and finally determine the best geographic unit of analysis. Results were then compiled into k = 5 clusters and these five Character clusters were described in terms of their statistical makeup and locations. Finally, flat files were created for export into Tableau for creation of a dashboard that will empower EPHT to view their compiled data at-will within the geographies discussed in the presentation.
Deliverables and Business Impact: Deliverables included a presentation illustrating that while Outcomes were not used to create Regional clusters, these Outcomes (such as low birth weights, certain Emergency Department visits, elevated child blood lead levels and life expectancies) were well-linked to clusters formed using the SDoH variables. We suggested to EPHT that managing health programs through these five Character clusters would contribute to a virtuous circle of NH public health improvement. In order to provide subsequent analysis opportunities to EPHT, the above Tableau dashboard was also created. Subsequent to the presentation, a package of files and descriptive material, including the Dashboard, will be presented to EPHT. This project was viewed as a pilot collaboration between EPHT and UNH Analytics & Data Science; based on the level of participation by constituents at EPHT, Bureau of Public Health Protection and Department of Health and Human Services, we were optimistic that the pilot would lead to future collaborations which was the long-term goal of this project.
Team: Veena Ganamukhi, Eric Knop, Joshua Roberge, Mukta Singh
Project Title: Identifying Risky Users: Using cyber security data to create risk scores for users.
Background: RiverStone is in the insurance and reinsurance marketplace and they are a subsidiary of Fairfax.
Research Question: The question RiverStone asked us, is can we create generalized risk scores for each user and a dashboard to add clarity to the scores.
Scope: RiverStone uses five different types of cyber security software to monitor their network. Each type of software is siloed and does not allow their cyber security experts to see which users are a risk and which users are not. We completed the following tasks:
Data preparation and pre-processing
Using different supervisors machine learning methods based on the features
Creating a dashboard to add clarity to the risk scores
Methods: We used the concept of an auto encoder to output risk scores for each user along with a linear model that allows the user to adjust the weights.
Deliverables and Business Impact: The dashboard allows the user to switch between the two risk scores and has graphs and filters that allow the user to find the pool of risky users and understand why they are risky. And gives a single view of all risks across whole company.
2018- 2019 Practicum Projects
Team: Nick Zylak, Jeremy Dickinson, Majoj Virigieneni, Mark McComiskey
Title: Identifying Traffic Wind Gusts: Determining Predictability of Roadway Impacts by Type
Scope / Research Question: Utilizing Scout generated data, AvantCourse would like to differentiate natural wind readings for those generated from traffic. Additionally, the team is examining the potential of predicting Scout readings, with their sensor-based data collection tool, using macro-level weather data. Currently, there exists no consistent method for identifying traffic wind.
Methods: We utilized exploratory data analysis collected from the Scout data. Modeling will utilize various machine learning algorithms including logistic regression, random forests, and XGBoost.
Team: Jiale Zhao, Viraj Salvi, Ben Forleo, Chad Lyons
Title: Identifying Urban Canyons Using Street Level Wind Data in the Boston Metro Area
Scope/Research question: Our UNH team focused on wind data collected by the Scout to identify a phenomenon known as urban canyons within the greater Boston metropolitan area. For the purpose of this project, an urban canyon is defined as a segment of any street where the physical environment helps channel and amplify street level wind speeds on a consistent basis. Identifying urban canyons may be of particular interest to fleet managers, as high wind areas may subject vehicles to increased air resistance and negatively impact gas mileage. If urban canyons can be identified, they could potentially be included in route optimization algorithms that help fleets operate efficiently and minimize costs.
- Regression splines (LOESS) were used to estimate wind speed at various weather stations on a per second basis. NOAA weather data is only available on a per hour basis
- The team utilized OSMNX, a python library that represents street maps as a network graph, to break the road grid into discrete segments separated by intersections.
- Predicted wind values for NOAA weather stations were compared to average road level data for each road segment for which the team had data.
- The team constructed an interactive dashboard using Plotly and Dash to visualize the results of this analysis.
Team: Jessica Hammond, Monit Guin, James Blauvelt, Shatrughan Sharma
Title: Optimizing Financial institutions’ Reactions to Changes in Federal Interest Rates
Background: Financial institutions make money off the margin between Federal Bank’s rate and their respective consumer interest and business loan rates. It is difficult to know how much to adjust their rates with a fluctuation in Federal Bank’s rates. After an unprecedentedly low rate environment which lasted for several years, deposit rates are starting to increase as we move further into a rising rate environment.
Research Question. Should the Federal Reserve continue to raise rates, what will the impact be on financial institutions’ interest-bearing non-maturity deposits? In different market places, how does a bank’s reaction to the Federal fund rate affect their ability to maintain and grow deposits?
Scope: The analysis used publicly available FDIC call sheet reports, and included financial institutions that were active between 2000-2018 and with total assets less than 20 billion dollars.
Methods: To address this question, the team utilized a number of techniques including exploratory data analysis, feature engineering, outlier analysis, clustering analysis, Autoregressive Integrated Moving Average forecasting techniques, and deep learning models. The deliverable will be a well-documented ensemble system of rate optimization. It will be implemented in R and delivered in a PowerBI dashboard.
Team: Sam Isenberg, Maz Hejazidahaghasi, Neha Narla, Michell Friend
Title: Predicting Whether Banks Will Fall Below "Well Capitalized"
Scope / Research Question: Darling Consulting Group would like to identify banks at risk of falling below “well capitalized,” which occurs when a bank’s leverage ratio falls below 5% or total capital ratio falls below 10%. Banks who fall below these marks are subject to certain regulatory restrictions, and therefore Darling would benefit from having any advance knowledge of banks in danger of these thresholds. The project’s goal is to project banks’ future respective leverage ratios and total capital ratios based on publicly available FDIC data.
Methods: To address this question, the team utilized ARIMA time-series modeling and LSTM modeling, along with traditional machine learning modeling.
Team: Ben Weckerle, Manoj Virigineni, Jen Legere, Alicia Hernandez
Title: Eliminating Manual Review from a Smoking Alert System
Background: FreshAir Sensor is a company that helps its clients maintain a clean environment, detecting cigarette and marijuana smoke with their proprietary chemical sensors. Their market is providing detection services to hotels and other establishments.
Research Question: FreshAir Sensor currently employs a process that requires a manual data review for every case of suspected smoking particulates via their sensor technology. This process requires an employee to be on call at all times. At present, FreshAir devices collect data from 5 internal, high-frequency sensors. Once an anomaly is detected, the event data is sent to a manual reviewer who makes a decision on whether the alert was caused by smoking or not. This project seeks to examine detection recognition algorithms so as to reduce the number of alerts manually reviewed by employees by implementing a machine learning system to better discriminate between smoking events and other anomalies.
Methods: To reduce the number of alerts that need to be manually reviewed, the team will provide an ensemble of predictive models. The ensemble will identify alerts that do not need to be reviewed (having a high degree of confidence of smoking or not smoking). Algorithms that contribute to this ensemble include a Support Vector Machine (SVM), a Random Forest, an Extreme Gradient Boosting (XGBoost) classifier, and an Artificial Neural Network (ANN).
Team: James Blauvelt, Brennan Donnell, Sam Isenberg, Joanna Grory
Title: Sensor Anomaly Classification with Deep Learning
Research Question: Can the number of manually reviewed events be reduced while simultaneously letting a minimal number of smoking events go undetected?
Methods: To reduce the number of alerts sent to manual review, and to maintain a high level of accuracy, both feature-based and deep learning approaches were explored. A convolutional neural network produced the best results and will allow FreshAir to more accurately triage events that are sent to be manually reviewed.
Team: Eric Dorata, Ben Forleo, Anna Kot, Mark McComiskey
Title: Visualizing Storage
Background/Scope: Identifying server storage capacity is often difficult, forcing data professionals to scramble to address critical business interruptions because of insufficient disk storage. The challenge lies in accurately identifying when systems will run out of storage space - which can have significant consequences for Liberty Mutual Insurance Company, slowing data delivery or bringing business operations to a halt.
Methodology: In addressing the challenge, the project team applied several data cleaning and analytical techniques aimed at uncovering the hidden structure of the data and identifying trends. These analytical techniques included dimension reduction, mixed-type clustering, and time series analysis.
Findings: As a result of the analysis, intelligent reporting will be implemented via integrated descriptive dashboards. Thus, enabling Liberty Mutual Insurance Company to proactively manage compute resources.
Team: Phoebe Robinson, Jeremy Dickinson, Bayleigh Logan
Title: Predicting High-Cost User Groups and Identifying Beer’s List Prescriptions
Background: Martin’s Point cares for a large population of Medicare and elderly patients in the practices and through their coverage plans.
Research Question: The task is to identify any individuals currently receiving care from Martin’s Point who may have been prescribed medications that are currently flagged on the American Geriatrics Society Beers Criteria for being inappropriate medication for older adults.
Methods: Utilizing Martin's Point Generations Advantage medical and pharmacy claims data as well as the current version of the American Geriatrics Society Beers Criteria, the team utilized a number of techniques including data mining, feature engineering, outlier analysis, descriptive analytics, and data visualization.
Team: Jessica Hammond, Monit Guin, James Blauvelt, Sharma Shatrughan
Title: Understanding High-Cost Users
Research Question. Martin's Point faces a highly skewed distribution of cost, with 9% of the population accounting for more than 50% of the total cost. The purpose of this study is to understand the main clinical groupings of individuals covered by Martin’s Point Generations Advantage insurance that incur high-costs. The task was to segment members based on the total cost and identify the movement of members between the cost segments over a three-year period and to provide additional insight into the characteristics of high-cost users.
Methods: We here utilized Martin's Point Generations Advantage medical and pharmacy claims data. Members were required to have at least 3 years of continuous enrollment during the sample period (2014-2018). To address this question, the team utilized a number of techniques including data mining, feature engineering, outlier analysis, clustering analysis, and data visualization.
Team: Jared Fortier, Jen Legere, Amy Chang, Neha Narla
Title: Identifying Factors That Drive Successful Referrals for Managed Care
Scope / Research Question: Martin’s Point has a predictive modeling tool that uses patient data to refer patients for managed care, but it greatly underperforms compared to caseworker referrals. This low success rate diverts resources away from contacting and enrolling the patients that will be eligible for managed care, leading to unnecessary costs and potentially to negative impacts on member health outcomes.
Methods: Using de-identified member data, we addressed the question of why members are or are not meeting criteria for referral. We used this information to try to address the low referral success rate of the data-driven referral model. The specific methods used to address this question include random forest models, support-vector machine models, and LSTM neural networks.
Team 1: Devan Miller, Anna Kot, Ben Weckerle, Brennan Donnell
Title: Predictive Analytics: A Targeted Approach to Student Retention and Success
Background: Affecting university rankings, school reputation, and financial well-being, student retention has become one of the most important measures of success for institutions of higher education; with freshman attrition steadily remaining at 30% at Plymouth State University.
As students have increasing options for educational and career opportunities, Plymouth State University engaged the University of New Hampshire to understand the causes behind freshman attrition, how to accurately predict at-risk students, and appropriately intervene to retain them.
Methodology: Using six years of institutional data along with relevant data mining techniques, the project team developed analytical models aimed to predict freshmen student attrition, including attrition likely from low academic performance. Models included regression, random forests, support vector machines, and gradient boosting. Variable importance analysis of the models was conducted to identify what factors are most important among predictors affecting freshman attrition.
As a result of the analysis, incoming first-year students can be placed into one of four targeted cohorts based on their predicted likelihood to leave Plymouth State University. Thus, enabling the University personnel to examine risk and recommend proactive advising approaches.
Team: Frawley Barton, Maz Hejazidahaghani, Amy Chang, Jiale Zhao
Scope / Research Question: To identify students is graduate or not, based on several factors, predict a student is going to finish his study in Plymouth or not.
Methods: Data mining and controlling for imbalanced classes. A variety of analytical tools including Support Vector Machines, Random Forests, Logistic Regression Modelling, XG Boosted models were employed.
Team: Matt Heckman, Monit Guin, Dishyant Kumar, Sam Karkach, Phoebe Robinson, Nick Zylak, Jared Fortier, John Gagno
Title: Influencing Litigation Strategies in the Insurance Industry
Research Question: RiverStone has a large number of re-insurance claims that have the potential to be brought to court. Our task was to identify additional data points that could aid in developing predictive models to better direct claims mitigation resources.
Scope: The analysis utilizes court case APIs from various asbestos claims across a number of states.
Methods: We utilized a number of techniques including data mining, JSON parsing, feature engineering, tagging, regular expressions, machine learning, and data visualization.
Team: Bayleigh Logan, Sam Karkach, Frawley Barton, Mitchel Friend
Title: Building a Triaging System for Incoming Customer Emails
Scope: Tasked with building a system for automating and directing customer emails to the appropriate department for processing and minimizing the workload put on the help desk teams.
Methods: Experimented with different text classification approaches such as TFIDF with Logistic Regression, Support Vector Machines, and XGBoost as well as Recurrent Neural Networks with LSTM layers. Implemented a System with two prediction phases and a validation threshold for setting the confidence levels of the final predictions.
Team: Chad Lyons, Dan Konig, Viraj Salvi, Dushyant Kumar
Title: Building a Triaging System for Incoming Customer Emails
Scope: Tasked with building a system for automating and directing customer emails to the appropriate department for processing and minimizing the workload put on the help desk teams.
Methods: Utilized TFIDF vectorization and applied several machine learning algorithms to predict the true target department of incoming emails. XGBoost and logistic regression with doc2vec were tested for a one phase prediction.
Team: Joanna Grory, Jon Bieniek, Heather Frechette, John Cagno
Title: Assessing the Impact of World Bank Interventions on the Health of the Caribbean Large Marine Ecosystem and the Prosperity of Human Populations that Depend on it.
Scope / Research Question: In 2002, the World Bank funded a Caribbean-wide project to address three major areas of concern in the Caribbean Large Marine Ecosystem (CLME): Unsustainable exploitation of fish and other living resources, habitat degradation and modification, and pollution. The project was evaluated upon completion to assess its success in meeting the primary goal: to help Caribbean countries improve the management of their shared marine resources through an ecosystem-based management approach. However, the Global Environmental Facility lacks information about the project’s broader impact on the health of the CLME and the socio-economic situation of the human populations that depend on it. This project addresses that information gap by analyzing pre- and post-intervention trends in multiple metrics of ocean health and human welfare.
Partner / Team: Independent Evaluation Office of the Global Environmental Facility of the World Bank – Blue Economy Projects
Team: Alicia Hernandez, Devan Miller, Eric Dorata, Shatrughan Sharma
Title: Digitizing Chemical and Waste Portfolio
Project Scope: Reviewing and synthesizing government reports is often difficult, forcing evaluation officers to allocate extensive time and resources to summarize key themes due to the lack of standardized reporting as well as lengthy reports. The challenge lies in accurately identifying which sections of the report are relevant findings – which can have significant consequences for the Global Environmental Facility because it requires a significant amount of labor hours and it is subjected to human error.
Methods: In addressing the challenge, the project team applied text mining techniques to identify relevant sections of the reports and developed a Shiny application for a proof of concept multi-summarization documentation tool doubling as an interactive dashboard to showcase the GEF’s project history and descriptive project information.
2017- 2018 Practicum Projects
Inpatients falling at Elliot impose a heavy penalty on the hospital and even more importantly are a major catastrophic event for patients. Elliot tracks these falls and uses the John Hopkins fall risk assessment tool (JHFRAT) to categorize a patient’s fall risk level. Our aim is to assess the performance of the JHFRAT at Elliot and elicit further patient information to be used in conjunction with machine learning methods to improve the tool’s accuracy.
Team members: Nisha Muthukumaran, Meseret Tekle, Daniel Walsh, Steven Glover, Julia Vaillancourt, Brandon Epperson
Team members: Patrick Kispert, Jacob Daniels, Christine Hanson, Sarah Brewer, Caroline Lavoie, Nemshan Alharthi, Serina Brenner
The largest provider of group disability in the nation is looking for improved ways to more accurately model risk at the client level. Leveraging third-party data, patterns can be identified in historically good and bad risks to develop a model that better predicts future risk performance.
Team members: Philip Bean, Joy Lin, Brandon Bryant, Sarah Brewer, Gowri Neeli, Olufisayo Dada
Martin's Point Health Care (MPHC) provides healthcare services as both an insurance carrier and as a medical provider. MPHC is interested in increasing its “Overlap” population. “Overlap” is defined as someone enrolled in one of Martin’s Point insurance plans and also receives medical care with Martin’s Point Healthcare. The primary question addressed was: How can MPHC increase their overlap population to provide a more comprehensive healthcare experience?
To examine the team utilized clustering techniques, artificial neural network (ANN) models and developed an interactive dashboard of the populations of interest.
Team members: Kim Lowell, Caroline Lavoie, Gowri Neeli, Thomas Cook, Suzannah Hicks, Michael Gryncewicz
Like many other Universities, the University of New Hampshire is concerned with increasing enrollment yield. Yield is the percentage of admitted students that make a choice to attend the University. To examine factors related to student yield, the team used a number of analytic techniques during the course of this project. Cluster analysis was utilized to segment students into different profiles across the dataset. For supervised learning techniques we used logistic regression, random forest models, and artificial neural networks for classification purposes to predict whether or not a student enrolls at UNH given that they are admitted.
Research shows that the pursuit of a better job remains the number one reason why freshmen choose to attend college. Universities must be increasingly aware of their responsibility to help students attain this goal. A cornerstone of this strategy at UNH is the Career and Professional Success office, which is committed to empowering UNH students to attain the knowledge and skills needed to succeed in their professional lives. While UNH alumni continue to have full-time employment and workplace engagement rates that are higher than the national average, the question remains: Are there things that UNH can do to increase student job-readiness and personal success? This project presents two solutions: 1) the use of predictive modeling techniques in Python and storytelling in Tableau to identify the relationships between student activity and post-graduation outcomes, and 2) the simulation of a non-siloed database containing student data with the power to identify opportunities for intervention and support that can impact student success over time.
Team members: Connor Reed, Nemshan Alharthi, Olufisayo Dada, Jacob Daniels, Christine Hanson, Amanda Fakhoury
Avante course machine learning. How can hyper-local data collection be refined for new product design? Avant Course set a goal to refine road bump detection methods and implement road classifications in order to identify optimal driving routes for electric and autonomous vehicles. By implementing road classifications, electronic vehicles can extend the range they can travel by avoiding extremely bumpy or unkempt roads. In order to achieve this goal multiple functions were created. Geospatial python packages were used for calculating distance driven, mapping road IDs, and visualizing drives and bumps detected. Using iPhone gyroscope data in order to detect when a phone is being moved by a user rather than reading a bump on the road. Unfortunately, if a user moves their phone without turning it along any axis, this movement would not be detected as a user movement, rather as a car movement. A neural net was run in order to predict where a bump was detected due to user movement. As expected, the model predicted no user movement because the response variable was so sparse. In order to reduce user transmitted data and the amount of data stored, a function was created to limit the number of readings per second. By limiting the number of readings not only does it reduce the amount of data stored, but it ensures a faster processing time when data is used. This information is then used to detect and categorize bumps, as a result the roads are classified into their own categories ranging from safe to dangerous.
Team members: Jolanta Grodzka, Serina Brenner, Michael Gryncewicz, Amanda Fakhoury, Meseret Tekle, Daniel Walsh
Arkatechture is a Data Analytics company that has identified financial institutions (FIs) as a key client group. Arkatechture wishes to increase its client base by using publicly available statutory reporting information. Moreover, to date Arkatechture has focused on credit unions (CUs); they would like to expand their banks clientele.
This project is primarily one of data architecture and manipulation as opposed to one of data analysis. Our task is to facilitate the use of FI data rather than analyze the data to find, for example, under-performing FIs. This consisted of five tasks. 1. Development of the over-arching data flow. 2. Database development. 3. Conformance and KPIs. 4. Data Cleaning. 5. Data Visualization.
At each step different tools and techniques were employed. Amazon AWS and Redshift were utilized for the database component and for KPI construction. Other tools in Python, R, and Tableau were utilized throughout.
Team members: Michael Shanahan, Brandon Bryant, Katharine Cunningham, Kim Lowell, Jolanta Grodzka
This project explores the NH State Police crash dataset of all commercial vehicle crashes to better understand the causes of commercial vehicular crashes and the role of distracted driving. The National Institute for Occupational Safety and Health (NIOSH) has focused attention on motor vehicle crashes as the #1 cause of work- related injury in the U.S. Currently, NIOSH provides evidence for distracted driving as being a major cause for commercial vehicle crashes. To explore the characteristics of the at-fault drivers, we applied an unsupervised technique, specifically clustering, to describe the driver behavior. We used a Random Forest model to determine most likely causes of accidents and the likelihood of injury given an accident.
Team members: Brandon Epperson, Suzannah Hicks, Michael Shanahan, Patrick Kispert, Joy Lin
Darling Consulting Group (DCG) focuses on asset liability management (ALM) services in an attempt to mitigate risk while ensuring financial stability for their clients. This project sought to create a model to predict commercial loan prepayments for small to mid-size banks. Anticipating loan prepayments allows a bank to better plan for future cash flows, structure their balance sheet, and prepare for regulatory oversight. The practicum teams explored numerous methods to predict loan prepayment using a data set including a time series of loan payments from six anonymous regional banks. The predictive methods include artificial neural networks (ANN), recurrent neural networks (RNN), time-series analysis (ARIMA, Exponential Smoothing), and random forest. Additionally, the team explored descriptive details of the loans and the external economic factors impacting prepayment behavior.
Team Members: Katharine Cunningham, Steven Glover, Julia Vaillancourt, Thomas Cook, Nisha Muthukumaran, Connor Reed, Philip Bean
2016- 2017 Practicum Projects
Team members: Joan Loor, John Kelley, Robin Marra, Yitayew Workineh
Description: Granite State College (GSC) was established in 1972 by the Trustees of the University System of New Hampshire as the School for Continuing Studies. The Mission of Granite State College is “to expand access to public higher education to adults of all ages throughout the state of New Hampshire.” To fulfill that goal GSC has five full-service Regional Campuses and three Academic Sites.
This project focused on the identification of factors that affect the prospects for academic success at Granite State College. To that end, GSC provided us with de-identified student data regarding demographics, enrollment patterns, academic performance, and financial aid status. In addition, we created many new features that were important to our analyses. Employing random forests and clustering we identified five distinct groups of GSC students. Subsequent survival analysis enabled us to isolate several factors that either decreased or increased the propensity of students to drop out. Based upon our findings we recommend several warning indicators that GSC can utilize to enhance student retention.
Team members: Jamie Fralick, John MacLeod, Erica Plante, Swapna S
Description: Martin’s Point Health Care’s mission is to provide better care at lower costs in the communities they serve. The team was tasked with developing a new method to predict groups of members who might be at increased risk of experiencing a major medical event, enabling Martin’s Point to proactively reach out to those patients with preventative care offers in hopes of sparing them a future medical event.
Using de-personalized medical and pharmacy claims data provided by Martin’s Point, the team performed an observational study. Cluster Analysis techniques were used to segment members into several groups with similar claim profiles. The team then used Survival Analysis over the observed timeframe of the data to assess the likelihood that each cluster would incur a major claim during that timeframe. An interactive dashboard was created to allow Martin’s Point drill down into the clusters of relatively higher risk of a major claim, and reach out to those members with preventative care.
Team Members: Adetoun Adeyinka, Hailey Bodwell, Richa Kapri, and Mengying Xu
Description: CA Technologies is an international, publicly held corporation that ranks as one the largest independent software companies in the world. CA Technologies creates systems software that run in distributed computing, mainframe, virtual machine, and cloud computing environments. This practicum project focused on developing an analytical model to more efficiently map CA’s sales teams to potential customers. The data utilized to complete this goal was sales-related data, which included sources such as current and historical contract information, client data, and usage activity. A variety of clustering techniques were utilized to statistically group clients based on common characteristics. Within each cluster, optimal buyers of product groups were identified. These optimal buyers were used to develop propensity scores that sales teams are now able to use as a method of targeting products to clients that are most probable to purchase.
Team Members: Logan Mortenson, Shane Piesik, Soumya Shetty
Darling Consulting Group (DCG) is one of the largest asset liability management firms in the United States, helping banks and credit unions manage balance sheets effectively.
Description: This project sought to determine core bank customers versus rate shoppers for specific bank clients. DCG gave us a 16 GB data folder including different files for each month from 2004 until August 2016. Through feature engineering and data transformation, the team was able to manipulate the data in order to find similarities in different customer accounts for this bank. The solution to DCG’s dilemma was two-fold. The team created a Support Vector Machine algorithm in order to predict customer dropout rate from the bank and then used survival analysis in order to determine a safe prediction period of when a customer is going to leave the bank. The information can be used to alert the bank as to when they should reach out to important customers who may be on the verge of leaving.
Team members: Colin Cambo, Pujan Malavia, Austin Smith, Benjamin Tasker
Description: UNUM insurance is a Fortune 500, provider of insurance protection for 33 million persons worldwide. The primary goal of this project was to create workflow optimization for the thousands of emails UNUM receives per day. A secondary task was to experiment with sentiment analysis to see if sender mood could be obtained in real time. The data set provided contains 2 million emails from two different databases.
The team used Python for all coding purposes and used regular expressions to clean the emails. They removed subject lines, confidentiality statements, etc. This methodology helped the UNUM group classify the emails into several groups at a 70% accuracy. If UNUM were to implement the email model provided, savings for just the customer service department is estimated to be approximately $3 million dollars.
Team Members: Bethany Bucciarelli, Shannon Snively, Minh Ly, Phi Nguyen
Description: The process of this project was to examine a multitude of factors related to driving risk, including weather, road surface conditions and auto accidents en route. The goal is to establish a measure to determine a person’s likelihood of encountering risk along a given route. In addition, the team was asked to validate a weather severity score previously developed by winningAlgorithms. To study auto accidents, data was utilized from the US Fatality Analysis Reporting System (FARS) to analyze the prevalence, cause, and risk associated with fatal car accidents. Attributes included the number of people involved, the number of people who died, weather, time of day, and time of year as factors plus additional factors to determine an overall probability of someone encountering an accident along a route as the road’s severity score. Additionally, weather data collected by wA’s avantCourse was gathered and used in conjunction with FARS. Findings indicate that during certain weather conditions, such as clear summer days, accident prevalence is greater than any other time or weather condition. Additional attributes and measures are being experimented with to develop the most reliable model.
Team Members: Kevin Rossi, Zachary Porcelli, Arber Isufaj, & Suofeiya Yin
Description: This project provides Elliot Hospital with financial insight towards their Case Mix Index (CMI). Elliot Hospital is looking for this insight to the causes of fluctuations in their CMI. The M.S. student team sought to develop a solution in which they could better analyze past patients in order to budget correctly for the future. With data provided by Elliot Hospital, the team developed a dashboard that allows Elliot to view characteristics such as CMI, Length of Stay, Paid Amount, and Month by selecting three features at a time. When certain features are selected, such as Department, Payer, and Diagnosis Related Group, the dashboard produces a calculated CMI based on the selected inputs. This gives Elliot the ability to dig down and pinpoint the different areas of the health system that are showing fluctuation in the CMI. With this dashboard, Elliot Hospital will be able to prepare better for future patient’s needs.
2015- 2016 Practicum Projects
The specific goal was to predict the duration of short term disability claims upon intake in order to optimize claim assignment and to resolve claim cost. The team also investigated the introduction of new technologies and methodologies to the traditional analytic approach previously used that held the potential to decrease the analytics Q&A and deployment timeframe by a substantial amount.
Team members: Kofi Ebakyea, Alex Booth, Alissa Andrews
This project proposed to derive a method for identifying associations between mental disorders and physical comorbidities using patient segmentation models based on patient demographics, metrics for resource utilization, and historical claims data. An interactive geospatial dashboard was further developed to optimize the location of a new care centers.
Team members: Justin Greenberg, Jon Vignaly, Pritti Joseph
This project examined the medical nature of the opioid epidemic in New Hampshire using the NH Comprehensive Health Care Information System (CHIS) to look at county based rates and trends in prescribing (opiates, treatment & blockers), mortality by drug use, diagnoses and SUDS (substance abuse disorder) within the state. Outcomes provided county-based analysis of prescribing and opioid use as well as treatment. In addition, the outcomes exposed the value and limitations of public use data in performing this type of inquiry and suggested policy change and further research for improved future analysis.
Team members: Adrienne Martinez, Carol Page
The UNH project defined factors related to student success at UNH. There were three primary objectives of this project: 1) To create segments of various kinds of undergraduate students; 2) to quantify predictors of success amongst those segments; and 3) to quantify psychographic predictors of success. Data included academic histories of students for five years, the first destination student survey, and primarily derived psychometric survey data conducted on UNH
Team 1 members: Derek Naminda, Yuyu Zhou, Alyssa Cowan
Team 2 members: Rachel Cardarelli, Kevin Stevens, Chris Dunleavy