Environmental problems are complex, evolving issues that defy straightforward solutions. These challenges demand integrated data, yet current environmental data is scattered and hard to access. Artificial intelligence (AI), alongside strong metadata frameworks, is emerging as a powerful tool for breaking down barriers to data discovery. Here Professor David Topping outlines the importance of access to quality data in solving environmental problems, the issues associated with this access and suggests ways to unlock the potential for environmental data to inform real-world solutions in the UK.
- The current state of environmental data access is unsatisfactory and restricts the potential for solutions to environmental problems.
- AI is emerging as a powerful tool to break down barriers to data discovery but lacking AI frameworks and metadata standards are hindering progress.
- The NERC Digital Solutions Hub is a strong example of how AI and good metadata can be combined to improve environmental data access and inform solutions to ‘wicked’ environmental problems.
Access to environmental data is not FAIR
Data collection is integral to environmental science. Yet just as research efforts are siloed, so too are the digital infrastructures that house environmental data in the UK. Although a recent review suggested that environmental science compares favourably to other disciplines in terms of alignment with the FAIR data principles, our consultations with environmental data users across the UK revealed significant and persistent barriers to data access and use. Open data, while commendable, is not necessarily discoverable or automatically useful.
This presents a challenge for science and policy in the environmental domain. Many issues we face are known as ‘wicked problems’. These are complex, evolving issues that are hard to define, involve diverse and often conflicting interests, and defy straightforward solutions. Environmental problems are emblematic of this, as they sit at the intersection of ecological, social, economic, and political systems.
Such interconnected challenges demand equally integrated data. The current state of environmental data, which is scattered, unevenly curated, and often difficult to access, makes it hard to draw links between phenomena such as climate projections and health outcomes. The INSPIRE Regulations 2009 set out to counter this and facilitate better public access to spatial information across Europe, yet there is still much to be done to achieve this integration.
The role of AI and metadata in better data discovery
AI, especially Large Language Models (LLMs) offer a transformative opportunity in how we search and interact with data itself. Public attention has largely focused on the generative and conversational capabilities of LLMs, which have revolutionised search, discovery, and digital assistance. However, these same technologies can be harnessed for good to answer domain-specific questions that traditionally require expert triage and access to siloed data sources. Imagine visiting a platform and asking: “What data can help me understand the impacts of transport emissions on public health?” In the traditional model, a team of experts might unpack this into a series of scenarios, identify relevant datasets, and design an analytical pipeline. Assuming that they know where and how to access the right data.
A key technology enabling this shift is retrieval-augmented generation. This approach allows LLMs to augment their responses by pulling in relevant information from external sources, be it documents, datasets, or structured metadata. If metadata describing datasets are embedded using this approach, then even imperfect or incomplete descriptions can still be matched semantically with user queries. Suppose a user asks for data on health impacts from mould exposure; a complex combination of atmospheric and clinical science, atmospheric monitoring and social behaviour. Even if the metadata doesn’t explicitly mention mould but references composting emissions, an LLM might still identify the dataset by associating composting-related spores with respiratory outcomes thanks to its semantic understanding. LLMs can also be used to improve metadata quality by supplementing, standardising or inferring missing fields. Of course, there are evolving barriers around public trust of results generated from such tools, as discovered through our recent research.
The NERC Digital Solutions Programme, have worked directly with environmental data users to ensure that AI-powered tools available on the NERC Digital Solutions Hub (DSH) are grounded in practical needs. By integrating LLMs with strong metadata frameworks and participatory design, the DSH empowers policymakers to discover evidence more efficiently, trace its provenance, and apply it with confidence to address urgent environmental challenges. In doing so, we are not just improving data access, we are reimagining what responsive, AI-enhanced environmental science can look like.
Centre Environmental Problem-Holders in Policy Design
Policymakers should be at the forefront of technological development. By engaging directly with initiatives like the DSH, they gain early visibility into practical use cases, opportunities, and challenges. This insight enables them to advocate for and shape policy that supports innovation while ensuring ethical, equitable, and sustainable deployment. This is timely as the UK government’s AI Action Plan acknowledges that trustworthy, high-performing AI will be essential to achieving the government’s missions, from building an NHS fit for the future to making Britain a clean energy superpower.
Regulations and investment strategies should reflect the realities of interdisciplinary, applied science and support actionable insights. To strengthen engagement with on-the-ground expertise this could include establishing a cross-sector expert panel focused on AI and environmental data, like that seen in GO-Science, with rotating membership from academia, public bodies, and practitioners, therefore embedding more agile, domain-specific expertise into government decision-making.
Build Capacity Within Public Sector Organisations
Whilst the UK’s principles-based approach differs from other developing global governance models for generative AI, emerging collaboration agreements between the EU and UK will support joint development of new tools around, for example, AI factories. This might lead to appropriate governance by design. However, to effectively address the challenges of fragmented data landscapes and unlock the potential of AI technologies, all policy-driving organisations, including government departments and local authorities, should develop or update their AI strategies.
The AI playbook, released this year, provides government departments and public sector organisations with accessible technical guidance on the safe and effective use of AI. However, they need to go further so that government can ensure their public services can deliver the best possible outcomes for citizens and businesses across the UK. The Department for Science, Innovation and Technology (DSIT) should ensure that AI strategies are grounded in a clear understanding of the current data ecosystem, the evolving landscape of generative AI tools, and infrastructure needed to enable responsible, scalable adoption. To achieve this, DSIT should invest in training and infrastructure, and facilitate access to expert guidance within government agencies to assist with their AI strategies. If public sector bodies move quickly to adopt and model good data practices, they can set visible standards that places constructive pressure.
Now is the time to explore the use of LLMs for data search and discovery. A growing ecosystem of tools and platforms already exists, many of which can be trialled with minimal investment and without becoming dependent on proprietary solutions. These tools, such as the DSH, are not only accessible but are being taught and adopted by the next generation of scientists, analysts, and civil servants. Building awareness, testing these technologies in real-world contexts, and learning from early implementation efforts is a cost-effective, future-proof step that organisations can take now.