Data engineering set for AI change

by PBT Group | May 28, 2024

Data engineering set for AI change

by PBT Group | May 28, 2024 | Blog | 0 comments

Data engineering set for AI change

Julian Thomas, Principal Consultant at PBT Group

Artificial intelligence (AI) will fundamentally transform the data architecture and engineering professions. According to research, in 2022, global business adoption of AI was sitting at 35%, an increase from the 31% of the year before with no signs that it will be slowing down any time soon.

Incorporating AI into operations offers significant opportunities for disruption, but the effectiveness hinges on trust. Traditionally, businesses have built machine learning models using secure, internal data sourced from their systems or acquired through reputable data providers like credit bureaus. This established practice ensures both the security and reliability of the data.

Internet sourcing

However, the landscape is shifting with modern AI applications, where there is a growing reliance on data sourced from the Internet. This trend raises concerns about the accuracy and validity of the data, emphasising the adage, ‘garbage in, garbage out.’ As businesses increasingly need to rely on AI models that use core internal data for training, the governance and security of this data come into sharper focus. It is therefore crucial for companies to navigate these diverse data sourcing scenarios carefully, ensuring robust data management practices are in place to maintain trust in AI-driven operations.

While AI and deep learning models are designed to improve over time, this does not necessarily guarantee the accuracy or relevance of the outputs when relying on freely sourced Internet data. These models refine their algorithms based on patterns and feedback, which allows them to evolve and enhance their performance gradually.

It is important to recognise that AI does not inherently distinguish between ‘good’ and ‘bad’ data. The criteria used by AI to evaluate data remain largely opaque, lacking clear auditability. This presents a challenge in ensuring the relevance and correctness of the information. For example, if you request instructions for building a six-seater table intended for adults, the model might provide accurate instructions -but for a child-sized version. This demonstrates that while the information may be correct, it might not always be relevant to the user’s specific needs. Therefore, as AI continues to advance, it is crucial to question and scrutinise the basis on which it processes and improves its data outputs.

A challenge of using Internet-sourced data for AI applications concerns the providence and ownership of that information. While data found on the Internet may not typically be protected by stringent regulatory requirements, the reverse scenario poses a greater risk. Businesses must be vigilant when supplying their protected data to cloud-based AI services. Concerns include how the data is secured, whether it is being used to train other models that could benefit third parties and overall data privacy.

On the other side, AI platforms themselves face potential legal risks, such as copyright infringement, when they use externally supplied data for building models. This could occur if the data, whether sourced from services or directly from the Internet, is used in ways that inadvertently benefit other stakeholders without proper authorisation. Therefore, businesses must consider both the security of their data and the compliance obligations governing it when engaging with AI technologies.

Strategic alignment

AI, at least for the time being, is not a technology that can or should be left to its own devices. Instead, it is a tool that can assist data architects and engineers do their jobs more effectively. For instance, companies can use the technology to automate repetitive and time-consuming processes that require little to no human intervention, leaving their data specialists with more time to focus on strategic work.

Of course, AI should never be left untethered. How would the organisation monitor the AI decision-making process, and will human operators have any control over what AI can be implemented within the business? A company must therefore have significant safeguards in place to ensure ethical and safe AI behaviour takes place.

In addition to these guardrails, the AI decision-making process must be aligned with the broader business strategy. Imagine if a purely AI-driven call centre for an insurance company targeting women suddenly decides to sell policies to men. While there may be no ethical or safety concerns around this decision, it does not match the organisational mission.

Companies must therefore consider how to govern the AI decision-making process to ensure it stays within the boundaries of how the business wants to operate.

Making practical sense

As with any technology, the use of AI comes down to doing the basics right. This means that when AI is used as part of a data engineering solution approach, specialists need to consider the technology for analysing data, anomaly and outlier detection, advanced matching for data quality analysis, natural language processing for extracting data from unstructured data and performing sentiment analysis, recommendation systems, fraud detection. These are all things that have been indispensable to data engineers. AI can just be used to enhance the techniques further.

Throughout all of this, the role of the data engineer will remain secure. AI is not something that will replace the human skills and unique domain knowledge of an organisation in which an engineer is involved. Instead, a data engineer in conjunction with AI technology can refine models to increase performance, look to enhance data analytical outputs, and improve insights on how best to make judgement calls around data usage.

Of course, this does not mean data engineers can sit back and relax. They must embrace a mindset of continual upskilling and learning. Given how technology is changing, the data engineer of the future will be someone who has a good blend of mathematical, statistical, computer science, and data skills and knowledge. It comes down to using the technology in conjunction with their own skills and experience to create data-driven solutions that are still compliant with all data governance requirements.