Introduction
Keeping up with the relentless pace of Natural Language Processing research can feel like drinking from a firehose. So, what are the breakthroughs that truly matter? To find out, members of the ComplyAdvantage data science team recently attended the Association for Computational Linguistics 2025 conference (ACL 2025). We're back with a clear-eyed view of the landscape, and this post summarises the key trends poised to have the biggest impact on the industry and how we build intelligent systems.
Themes
The biggest themes of the conference were LLM Applications, Vector Embeddings, and Synthetic Data. There were salient points raised on LLM Security and LLM Hallucinations.
LLM Security

The highlight of the conference were the discussions around LLM security. The consensus appears to be that the security of agentic systems is fairly poor across the board. “Red-team” security researchers were discussing remote code execution style attacks with 50-100% success rates against example agentic systems. Zero (human) click approaches were also put forward. “Recent” models that are perceived as having strong guardrails were not immune - it appears that moving from a single model call to an agentic framework can significantly degrade the security of the overall application.
This echoes the concerns of some industry practitioners, with some flagging that they could circumvent every guardrail their production LLM currently had in place. The consensus appears to be that model guardrails are not cast-iron defences against models carrying out inappropriate actions. The main recommendation was to use “non-differentiable” defences against LLM-focused attacks (e.g. regex / basic classifier models on user inputs).
In addition, it was flagged that model fine-tuning can significantly damage alignment, and that guardrails are much easier to subvert in low resource languages.
In short: Practitioners have to be very careful about alignment & security in our LLM applications, as the baseline defences in place don't appear to be difficult to circumvent.
Synthetic Data

Another salient trend was the (unintuitive) assertion that training on LLM-created/processed data could be more efficient than training on the raw data itself, with a keynote speech & numerous papers exploring the topic, spanning:
- Representing documents as LLM created summaries
- Carrying out RAG on LLM-generated queries about each chunk
- Training on human-created data that had been rephrased by an LLM (various papers, including Saad et al & Leesombatwathana et al)
In short: Synthetic data may be more useful than raw data for training ML systems.
LLM Applications

It should come as no surprise that research groups are throwing LLMs at every problem under the sun - with examples including bias detection, automatic teachers & computer use agents. LLMs are very much still a state-of-the-art solution for a wide number of academic problems.
Best prompting techniques continue to be a discussion point, with the main takeaways from the industry roundtable being:
- Long task descriptions for single, simple tasks gets the best results - chaining multiple tasks generally degrades performance
- The best prompts tend to be model-specific
- LLMs have better performance when fed & asked for natural language cases, rather than pseudo-json
In short: LLMs continue to be state-of-the-art solutions for many academic problems.
LLM Hallucinations

There is active research into the phenomenon of LLM hallucinations, spanning from how to discourage it in general to predicting if specific outputs are hallucinations. Techniques like RAUQ are asserted to work well at predicting hallucinations in whitebox models. The latest generation of LLMs have some capacity to verbalise how confident they are in an answer, but this underperforms other approaches.
In short: LLM hallucination prediction techniques exist, but many come with significant compute cost or need whitebox models.
Vector Embeddings

Embeddings remain a very active area of research. Applications spanning RAG systems, event resolution, entity search & LLM evaluation were all explored. Interesting work is being carried out on using multilingual embeddings in areas adjacent to entity resolution. Our discussions with researchers regarding how ComplyAdvantage extracts and manages Adverse Media data highlighted their interest in understanding how industry tackles the challenges and methodologies within this domain. This engagement reinforced the importance of mutual learning and open dialogue between industry and academia to advance the practical application of data science in this critical space.
In short: Embedding-based approaches have demonstrated value on many academic problems, including Entity Resolution.
Conclusions
The field is moving from demonstrating that LLMs can solve problems to more mature discussions around the efficiency and safety of LLM- based solutions. The two most impactful trends are the surprising utility of synthetic data for training, and the critical lack of robust LLM security. As we leverage these powerful models, we must keep a close eye on the security & alignment of our applications.
Appendix: Highlighted Papers for Industry Researchers
LLM Mechanics:
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
- Reconsidering LLM Uncertainty Estimation Methods in the Wild
Embeddings:
- Hierarchical Level-Wise News Article Clustering via Multilingual Matryoshka Embeddings
- Wikivecs: A Fully Reproducible Vectorization of Multilingual Wikipedia
- LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
- Enhancing Event-centric News Cluster Summarization via Data Sharpening and Localization Insights
Graph Approaches:
LLM Applications:
- A Survey of Context Engineering for Large Language Models
- Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence
- A Joint Optimization Framework for Enhancing Efficiency of Tool Utilization in LLM Agents
- RIOT: Efficient Prompt Refinement with Residual Optimization Tree
