Don't be Hit by the Analytics Backlash

One of my typical activities in May is to teach the Analytics Academy, a short program offered to Harvard Ph.D. students and graduates from the School of Arts and Sciences. The session is offered through the Office of Career Services and explores how Ph.Ds from various fields can secure jobs and prosper outside of academia in the field of analytics, big data, and artificial intelligence. Since that same office provided me with some very useful business orientation when I was seeking a (partially, as it turns out) nonacademic career, I am always happy to return the favor.

I’ve been doing this program for almost a decade, and the students have always been very enthusiastic and positive about analytics. But this year was different. Most of the questions and comments were about the negative aspects of analytics and AI. They focused on issues like how to prevent algorithmic bias, what industries would be least likely to do harm with analytics, and how to reduce the societal damage from AI.

To be honest, I wasn’t really prepared for these questions and comments, and one of my purposes in writing this post is to encourage you to prepare better than I did. Fortunately, I had recently written a book chapter that addressed some of these issues, so I wasn’t totally stuck for answers. But I think that all of us in the analytics field need to be prepared both to defend analytics and AI where appropriate, and to ensure that we’re not contributing to the problems the students raised. You may think this is easy, but it’s not.

For better or worse, algorithmic bias — which was first raised prominently in Cathy O’Neil’s book Weapons of Math Destruction – is at the heart of the analytical enterprise. Algorithmic bias has been widely criticized, for example, in judicial sentencing algorithms, where some models seem to have racial biases. Most companies would decry that injustice, but many of their models also discriminate on the basis of factors that their customers cannot change. Older people are more likely to die soon, so actuarial models have long suggested that they be charged more for life insurance. Women live longer than men on average, so their disability insurance costs more. Young people crash cars more frequently than older ones, so rental car companies make it difficult to rent for those under 25. These long-practiced algorithmic predictions typically work well for business purposes, but they could come under attack in the analytical backlash as well.

Given the increased consciousness about algorithmic bias among the public, companies should be prepared to explain and defend the variables they use in their models. This is particularly true in highly regulated industries like financial services and health care, but it is rapidly becoming an issue in every industry. In the European Union, for example, one of the many aspects of the General Data Protection Regulation (GDPR) enacted this month is the “right to an explanation” for the logic behind an automated decision. This may increase pressures not only for more transparency, but also demands for less use of demographic variables in models.

Organizations that create models with application in the public domain should be particularly concerned. I’ve already mentioned sentencing algorithms. The COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) system from Northpointe, Inc. is used for sentencing recommendations in criminal cases. It uses a relatively basic form of machine learning (it is trained on data to develop a scoring system that is then applied to new data), attempts to predict the likelihood of recidivism, and is one factor used by judges in recommending sentences for convicted offenders. It was found in a ProPublica study to predict that black defendants will have higher risks of recidivism than warranted, and to predict lower rates for white defendants than warranted. Northpointe has argued that ProPublica’s analysis is incorrect.

While the firm that created COMPAS is clearly research-oriented and highly analytical, the debate points out a related problem to algorithmic bias—lack of transparency. For competitive reasons, Northpointe refuses to release the algorithm it uses for scoring defendants. This is somewhat understandable, but as a result, defendants and their attorneys, judges, and observers of judicial processes are unable to fully assess this (at least partial) basis for sentencing. At least one other sentencing algorithm has been made public, and it has even improved the likelihood of alternatives to jail for many defendants. Called the Public Safety Assessment, it was developed by a private foundation, and the nine factors and weights used within its model are publicly available. The foundation is also commissioning third-party research on the impacts of the assessment.

These developments suggest that algorithms and AI programs used for public decision-making purposes may not be well suited for creation and marketing by the private sector. At a minimum, profit-making companies creating models for the public sector would seem to have some obligations to make their algorithms public under some circumstances. And even companies outside of the public sector should be discussing how to make their models more transparent and fair.

Of course, AI will compound some of these difficulties. I suspect that Facebook’s challenges in using AI to filter out hate speech or fake news is highly related to the students’ concerns. But that is only one aspect of the AI issue. For example, as less transparent algorithms like deep learning neural networks become more widely employed, it may be impossible to employ them in important public contexts because no one could understand how they arrived at a decision. The AI issue is too big to discuss in this context, however—I will cover it in another post. In the meantime, start preparing to answer the types of difficult questions I faced from these smart Ph.D. students at Harvard.