- Name: Ross McGowan
- Years at CivicScience: 4.5
- Location: Pittsburgh, PA
- Job Title: VP of Data Science
- College Major: Economics at Cornell University
- Graduate Program: Master’s in Statistical Practice at Carnegie Mellon University
- Job Summary: Ross is responsible for the practical applications of CivicScience data. Within the company, he helps drive product development and ensures scientific discipline. He also works with clients and publishers on analytical projects, helping them to find the most important insights they are (and are not) looking for.
Q: When would you say that Big Data started going mainstream?
A: So I feel like I started noticing it late in the last decade, personally. There were three major events in my mind.
Of course there was the book Moneyball. I read that not long after it came out in 2003. That was – being a sports fan – a big thing. That wasn’t big data so much as just using data intelligently – more analytics.
Next, I definitely think the two Obama elections were important because Obama’s campaigns publicized the fact that they were using data to inform all sorts of decisions – voter outreach, where they allocated their resources, figuring out individual voters – how likely they would be to vote for Obama – those sorts of things. That was high-profile stuff.
And at the end of the last decade – smartphones popped up, and those are basically data machines.
I’d say I was definitely jumping on the wave a little so to speak. I took stats in high school and liked it, but the fact that I thought this was a promising field was part of the reason I wanted to go to grad school for statistics.
Q: You saw promise in the field, but where did you look to quantify that decision?
A: It was really just the smartphone thing – just the fact that these new devices produced so much data. But also the fact that technology enables so many things to be quantified that were unable to be quantified in the past.
If you can quantify something, then by definition it’s producing data, and with more and more data, I felt like the ability to analyze that data would be a valuable skill.
Q: In your everyday life, do you take this emphasis on data in decision-making with you?
A: Yes. Put it this way. Last year I bought a house with my wife. When you’re interested in a house you don’t know exactly what to bid, and in our case, there was another bidder. So we placed a bid and they placed a bid. Keep in mind, we don’t know what their bid is. We have some impression of what the market value is, but then you have to try to assign a quantitative value to emotional things like how disappointed you would be if you didn’t get the house.
If you wouldn’t be disappointed at all, bid the market value, and if you don’t get it walk away, —you won’t care. But if you’d be very disappointed, then you should bid more than you think the house is worth, because ultimately you’ll be happy.
There’s no way to put a purely quantitative value on personal happiness or how much you would regret something, though there’s behavioral economics studies that attempt to quantify this stuff, but I try to think in that way.
Q: What do you see as the biggest opportunities and challenges in the field of Data Science right now?
A: The biggest opportunities? It’s hard for me to comment on fields I don’t work in, but based on purely anecdotal personal experience and just reading articles here and there, it seems like there’s a lot that could be done with healthcare.
There are lots of opportunities because it’s a field in which access to broader pools of data would present huge opportunity. The Precision Medicine Initiative is very interesting, I think. If there were databases about certain types of diseases, treatments that worked and in which situations, lots of deep granular data about patients, that would be a huge opportunity. But then, the challenge is that anonymity is crucial. You can’t just have everyone’s health records out there, so what happens with that data? That’s one challenge.
Another challenge is that there’s a lot of misperception about how people like me want data to be used. There are a lot of people who are data-averse or analytics-averse. I observe this a lot with sports fans. There’s a lot of people who hate the use of analytics in sports. They say things like, “well you can never quantify a player’s heart,” or “you can’t quantify a player’s ability to make a one-handed catch.” Well – but you can try. You can try to quantify a player’s work ethic or the factors that contribute to it. Why wouldn’t you want to learn more? That’s kind of what bugs me.
Resistance to using data to inform decisions is fading over time. But there’s still a lot of misperceptions that people like me want data to be the only factor in a decision – that we think the data is always perfect, or that we think the data tells the whole story. Any good statistician is aware, or tries to be aware, of ways that their analysis could be incomplete, and about factors they can’t quantify.
But maybe my biggest concern is a lack of transparency and knowledge about how data is used in algorithms. Things like age or race are often forbidden from being used when evaluating whether someone gets a mortgage, for example. But looking at our data every day, I know there are a great number of things beyond demographics that are strongly correlated with one’s age or race. Are those variables forbidden from being used in those models? I honestly don’t know, but I worry about the lack of transparency there, or that the laws in place cast a wide enough net to prevent stuff like discrimination from happening.
Q: Because the topic of privacy came up, what are your thoughts about the current debate surrounding consumer privacy – often talked about in tandem with big data?
A: First off, in terms of what data is collected from your devices and the internet, there’s not much privacy. So you just have to be careful. However, I think that from an ethical perspective, companies, and this seems like it’s starting to happen in some places, should have an opt-in data-sharing default. This touches on some behavioral economics concepts a little, too.
In other words, the default should be that your data is not shared everywhere, and you have to opt-in. That would put an onus on companies to create a fair exchange.
Q: With so much data out there, how can the average person tell what’s reliable data from what’s not?
A: The source of the data is most important, I think. If it’s a trusted source or something that’s proven to be reliable over time, that’s important. From an academic sense and a data science perspective, there’s an onus on transparency and reproducibility, meaning do they share their methods? Or in some cases their code. Are the people who collected the data open to showing how they did it?
There’s also an inverse relationship between confidence/arrogance in your numbers and reliability. If people aren’t willing to be questioned, then they should be questioned. But if people are trying to be transparent about what they perceive as flaws in their data, that in itself is a good sign. If you’re irrationally confident in your numbers, you probably missed something and may have made a mistake.
The last thing I would say is that one data point probably doesn’t really mean anything. But, if you have 10 data points that are all pointing in the same direction, that’s meaningful.
Q: What are the main traits of good statisticians you’ve worked with, here or elsewhere?
A: That’s sort of related to what I was saying, healthy skepticism, with both any data you receive, and data you collect. Everybody’s biased. Everybody goes about solving a problem in a certain way, and you have to look at the way you yourself analyze data in the same way you would looking for sources of bias in someone else’s data set.
There’s also the ability to think through a problem. Anybody can learn to program to a certain extent. Anybody can write up some quick R code to do a regression analysis. It’s more about thinking through the problem and thinking through the data that’s the important part – not memorizing how certain algorithms work. That’s tougher to find.
Q: What are the main clashing points you’ve noticed between those who think more scientifically and quantitatively and those who don’t?
A: Okay – this is an easy one. This is almost like a borderline catch phrase – but I’ve had lots of time to think about it. People think that statistics is basically providing certainty, but what statistics really is, is an attempt to qualify uncertainty. So, I think – and I understand this – for someone who isn’t a statistician or doesn’t have a quantitative background, it can be very frustrating that statisticians don’t give hard answers. We won’t say this is the way to act or this is what will happen – but that’s just simply because that’s not how the world works.
Look at what happened with the election and the prediction models. I followed the election as a voter, but looked at the prediction models more as a statistician. People were mad at Nate Silver because he kept hammering on the fact that his model had a higher probability of Trump winning. Well, look what happened. He was still wrong, but he did a much better job of quantifying the uncertainty that’s inherent in the polls.
If you’re thinking about uncertainty properly, then you’re thinking about the ways in which you could be wrong, so you’re not going to give firm recommendations. Many people want statistics to make a decision for them, but it’s more about helping them make a decision – which gets back to resistance to data.
The people who are resistant to data and analytics are like “well, I don’t want you making a decision for me” but that’s not the point. We want to help make better decisions, understanding that data isn’t the be-all-end-all.
Q: Lastly, with Big Data being so prominent, there are going to be a lot of math-averse people who will have to work around it in some way. What tips would you give them?
A: I think it’s a pretty safe assumption that people need to get more comfortable with data as data becomes ever-more available, and as more organizations incorporate data into their decision-making. However, not everyone is going to be 100% comfortable analyzing data and that’s fine. But there are going to be certain things that people are more comfortable with than others. If you’re a visual person, think about working with and learning about data from that angle.
Another thing would be – if you’re trying to get comfortable with data – to learn about it in the context of a field you’re naturally interested in. I mentioned Moneyball. Baseball is a classic entry point for a statistician. I’m not into baseball, but if you’re a baseball fan, you may want to learn about statistics through baseball stats. If there’s a hobby or field you have a natural interest in, try to learn about some quantitative aspect of that. That’s going to be more helpful for the non-statistician than reading The Elements of Statistical Learning. It will be a natural hook.
I think a failing of a lot of stats teaching is that it’s too reliant on teaching theory at first, rather than giving hard examples. Then people think statistics is boring and something they have to do, when statistics can be applied to almost anything and should be viewed as an opportunity to learn more.
Interested in reading more about the state of data and polling? Check out our CEO’s recent post on this election, and its consequences for polling in America. If you’re looking for something a bit lighter, check out our findings into voting preferences of the fictional Hogwarts Houses.