AI Data Sets and Learning Algorithms

3 min read
(August 25, 2023)

Chris Roberts, aka Dr. Dark Web, joined Dr. Rebecca Wynn on the Soulful CXO.  He is the award-winning, globally renowned CISO at Boom Supersonic and is considered one of the world's foremost experts on counter-threat intelligence. He has over 20 years of experience working in enterprise, industrial, and government segments, adjusting to evolving security threats. He gained global attention in 2015 for demonstrating the risk to aviation that allowed attacks against flight control systems.

He was routinely invited to speak at conferences and is a regular commentator on media. His ability to find holes in everything has helped companies, governments, and the public improve their safety over many years. He is a highly sought-after keynote speaker, serves on many advisory boards, and has dozens of published works. Additionally, he was recently awarded the Champion of Security award for inclusion at the RSA 2023 conference, which honors the security leaders who organize their teams to accommodate diversity, equity, and inclusion principles to the fullest degree. 

How AI Learns from 3 Different Data Sets: Open Web, Restricted Data and the Hidden Web

Artificial Intelligence (AI) data sets and learning algorithms are crucial in the age of AI. As AI systems become increasingly prevalent, it is essential to understand how they learn from data and the potential implications for trust and reliability. 

Dr. Rebecca Wynn and Chris Roberts share insights into the different types of data sets and learning algorithms. Namely the three main categories of data: the open, clear web, the restricted access data behind paywalls and register walls, and the darker, hidden web. Each of these categories presents unique challenges and considerations regarding learning algorithms. 

The open, clear web is the most accessible source of information and is what most people are familiar with when they think of searching on Google or browsing the internet. This data is widely available and can be easily accessed by AI systems. However, Roberts raises thought-provoking questions about the quality and reliability of this data, stating that it is essential to consider the credibility of the sources and the potential biases that may exist.

On the other hand, some data is not readily accessible to the general public. This includes information behind paywalls or register walls and data stored in individual people's systems. Accessing this data requires additional permissions and may need to be more easily incorporated into learning algorithms. They emphasize the importance of understanding who built the engine and how often the data sets are refreshed and updated.

Lastly, the darker side of the web includes hidden networks like the Tor Onion and various other platforms. This data type is often associated with illegal activities and presents significant challenges in access and reliability. Roberts details the difficulty of crawling and cross-referencing this data and the need to build relationships between disparate pieces of information.

Learning algorithms rely on these diverse data sets to train and improve performance. However, Roberts highlights the significant computational power required to process and analyze these data sets. It is not just about learning the data but also about understanding the relationships between different pieces of information. Building a robust data set that accurately represents the real world is complex and resource-intensive.

Trust and Accountability in the Age of AI

The episode also touches on the issue of trust and accountability in the age of AI. Dr. Wynn raises concerns about the right to be forgotten and the potential for incorrect or outdated information to persist. They discuss the challenges of correcting and sanitizing data, especially when it comes to sensitive corporate or customer information.

Trust in information is challenging in the age of AI. The abundance of sources, conflicting narratives, and potential biases of AI-powered systems make it difficult for individuals to determine what information to trust. The lack of accountability and recourse for errors or misinformation further upholds trust. Proactive security measures, collaboration and knowledge sharing, and establishing regulations and guidelines are necessary to address these challenges. It is crucial to ensure the responsible use of AI and prioritize the protection of individuals and society from the potential harms it can bring.

Listen to the full episode for insights on leadership skills, embracing resiliency, and bridging the gap between technology and business.  Additionally, tune in to learn more about Robert’s fascinating background and expertise in cybersecurity. This episode is a must-listen! 

Remember to subscribe to the Soulful CXO on your favorite platforms.  

Apple Podcasts   
Spotify  
Google Podcasts