Good bots, bad bots: Using AI and ML to solve data quality problems

Join top executives in San Francisco on July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Find out more

More than 40% of all website traffic in 2021 not even human.

This may sound alarming, but it’s not necessarily a bad thing; bots are core to running the internet. They make our lives easier in ways that aren’t always obvious, such as receiving push notifications about promotions and discounts.

But, of course, there are bad bots and they account for almost 28% of all website traffic. From spam, account hijacking, personal information gathering, and malware, it’s often the way bots are deployed by people to distinguish the good from the bad.

With the roll-out of universally accessible AI like ChatGPT, it will be difficult to distinguish where the bot ends and where the humans start. These systems are getting better with reasoning: GPT-4 passed the bar exam in top 10% number of test takers and bots even beat check CAPTCHA.


Convert 2023

Join us in San Francisco on July 11-12, where top executives will share how they’ve integrated and optimized their AI investments to achieve success and avoid common pitfalls.

Register now

In many ways, we can be at the forefront of a large number of important bots on the internet, and that can be a serious problem for consumer data.

Existing threat

Companies spend about 90 billion USD on market research each year to decipher customer trends, behaviors and demographics.

But even with this direct line to the consumer, the failure rate in innovation is enormous. Catalina predicts that the failure rate of consumer packaged goods (CPG) is at terrible 80%while the University of Toronto found that 75% of new grocery products fail.

What if the data these creators rely on is riddled with AI-generated responses and doesn’t really represent consumers’ thoughts and feelings? We will live in a world where businesses lack the fundamental resources to inform, validate, and inspire their best ideas, sending failure rates skyrocketing, a crisis that they can’t handle it right now.

Bots have been around for a long time, and much of market research relies on manual and instinctive processes to analyze, interpret, and weed out such low-quality respondents.

But while humans are excellent at putting reason into data, we don’t have the ability to decipher bots from humans on a large scale. The reality for consumer data is the emerging threat of big language model (LLM) will soon replace our manual processes by which we can identify bad bots.

Bad bot, meet good bot

While bots can be a problem, they can also be the answer. By creating a layered approach using AI, including deep learning or machine learning (ML) models, researchers can create systems to disaggregate low-quality data and rely on good bots to implement them.

This technology is ideal for detecting subtle patterns that can be easily overlooked or misunderstood by humans. And if managed properly, these processes can provide ML algorithms to continuously evaluate and clean data to ensure quality that can resist AI.

This is the way:

Create a measure of quality

Instead of relying solely on manual intervention, teams can ensure quality by creating a scoring system through which they identify common bot tactics. The construction of a quality measure requires subjectivity to perform. Researchers can put barriers to interactions between factors. For example:

  • Spam Probability: Are the responses made up of content inserted or cut and pasted?
  • Nonsense: A human answer will contain a brand name, proper noun, or misspelling, but often results in a convincing answer.
  • Skip recall questions: While AI can fully predict the next word in a sequence, they cannot reconstruct individual memories.

These data checks can be subjective – that’s the point. More than ever, we need to question data and build systems to standardize quality. By applying a scoring system to these characteristics, researchers can aggregate scores and discard low-quality data before moving on to the next test class.

Look at the quality behind the data

With the rise of human-like artificial intelligence, bots can get through the cracks with just quality scores. This is why it is imperative to layer these signals with the data around the output itself. Real people take the time to read, re-read, and analyze before answering; The bad guys don’t usually do that, which is why it’s important to look at the response level to understand the trends of bad actor.

Factors such as response time, repetition, and insight can go beyond the surface level to deeply analyze the nature of responses. If responses are too fast or nearly identical responses are recorded in one (or more) surveys, it could be an indication that the data is of low quality. Finally, go beyond meaningless answers to identify the elements that make up an insightful answer — by seriously considering the length of the answer and the sequence or number of adjectives — that can be eliminated. the lowest quality answers.

By looking beyond the obvious data, we can establish trends and build a consistent high-quality data model.

Get AI to clean for you

Ensuring high-quality data is not a “set it up and forget it” process; it requires censorship and data entry for good — and bad — a consistent way to achieve the on-the-go goal of data quality. Humans play an integral role in this flywheel, where they set up the system and then sit on top of the data to detect patterns that affect the standard, then feed these features back into the model. , including rejected entries.

Your existing data is also not immune. Existing data should not be fixed but should be subject to the same strict standards as new data. By regularly cleaning historical databases and benchmarks, you can ensure that every new piece of data is measured against a high-quality comparison score, opening up flexible decision-making possibilities. more active and confident on a large scale.

Once these scores are in place, the strategy can be replicated across regions to identify high-risk markets where manual intervention may be required.

Fight nefarious AI with good AI

The market research industry is at a crossroads; data quality is deteriorating and bots will soon take up an even larger share of internet traffic. It won’t be long, and researchers should act quickly.

But the solution is to fight nefarious AI with good AI. This will allow a good flywheel to rotate; systems become smarter as models collect more data. As a result, data quality is continuously improved. More importantly, it means that companies can count on their market research to make much better strategic decisions.

Jack Millership is the data specialist at Zappi.

DataDecision Makers

Welcome to the VentureBeat community!

DataDecisionMakers is a place where professionals, including technical people who work with data, can share data-related insights and innovations.

If you want to read about cutting-edge ideas and updates, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You can even consider contribute an article your own!

Read more from DataDecisionMakers


Goz News: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, the World everyday world. Hot news, images, video clips that are updated quickly and reliably.

Related Articles

Back to top button