There’s nothing more powerful and buzzworthy than automation, machine learning, and AI in this age of technology and software. Every day there’s more and more of a push for more machine learning (and AI) to be used in products and services. And it’s only beginning to catch on more with big data analytics. They say data is the new gold, and I’m inclined to believe it. Whether we like it or not, every day we are helping contribute to this growing monstrosity.
You might wonder how can a machine, which only does as programmed, is bad? Or perhaps you already have some ideas like with intrusive advertisements and such.
However, I don’t think that’s where we need to be concerned. The fact that we are overtly aware of it and have begun to tackle this issue through the government like with GDPR and the U.S. government looking into Google and Facebook, means we can handle this problem… eventually.
The problems I’m concerned with are the more subtle things. The things that aren’t as apparent as the news-breaking headlines of how Google or Facebook is now selling your data (again).
I’m talking about the bias and dangers that are in the data itself, data collection, and the way it's used. Let’s give an example of each to be clear.
Let’s use an example of “Criminal Machine Learning.” Take a moment a look at (a) and (b) and think about how machine learning might work on this dataset. Considering that (a) are classified as criminals and (b) are all non-criminals. What’s wrong here?
If we were to use a dataset where the data was gathered in such a way that most criminals look like (a) and most non-criminals look like (b) then this prediction model is garbage. It would consistently pick out humans who have more rough facial features, are frowning, and are not wearing suits as criminals. And everyone who just dressed up a bit would be completely fine. This is not okay, and a very good example of bias in the data.
Next, an example of bias in data gathering is ignoring the context in the gathering process.
For example, let’s say you do a poll about jobs between rabbits and turtles on Island A and it turns out that turtles always go slow and steady and have lower-paying, more risk-averse jobs in comparison to rabbits. However, what you might’ve ignored is that the turtles that live on Island A came from Island B. Where it was a place of pure volatility and the turtles who came to Island A were specifically looking for a different lifestyle.
This leads to bias in the actual data as well, and it can be very misleading if there are people with confirmation biases trying to draw their own conclusions about things in the world, whether maliciously or not.
Lastly, one of the more obvious ones to address is the usage of machine learning in a biased way. I always find these misleading graphs extremely funny, check the credit to see more, and it always serves as a reminder for me to be careful when just glancing at titles of articles and pictures without properly researching. Sometimes these clickbait-style headlines and pictures can get stuck in your head and misinform you.
Just like how people can misuse statistics like above, people can misuse ML models as well. It is very easy to train a model that can give you bad outputs and classification as we have shown above, and malicious people can use them in bad ways. Whether it's misleading people by saying the computer did it so it can’t be wrong, or it’s deepfakes that can trick people into thinking a fake person is a real one saying something. ML has many applications that can game-changing, both for the better and for the worse.
Here’s a funny deepfake video that can show the power of ML but also how it can surprisingly… tricky to notice if they didn’t make it as obvious.
Personally, I believe bias in the data is by far the most dangerous and hard to overcome thing in the field of machine learning and A.I. The problem lies in that humans are just inherently biased no matter how hard they try to avoid it. In fact, even if you conduct perfect data gathering and processing, trying your best to avoid all the pitfalls, it can still end up being a useless model. Just think about how everywhere in the world there are people with their biases and personal experiences, how people are being persecuted, how people are being censored or suppressed. There’s just no way to avoid it, and one should always be extremely vigilant and careful with the conclusion they “draw from data” as real life is never black or white like it can be in ML, it’s usually just shades of grey.
Thanks for your time, and I hope you learned something or at least got a chuckle out of it! 😊