Business Insights
  • Home
  • Crypto
  • Finance Expert
  • Business
  • Invest News
  • Investing
  • Trading
  • Forex
  • Videos
  • Economy
  • Tech
  • Contact

Archives

  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • February 2025
  • January 2025
  • December 2024
  • November 2024
  • October 2024
  • September 2024
  • August 2024
  • July 2024
  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • February 2024
  • August 2023
  • January 2023
  • December 2021
  • July 2021
  • November 2019
  • October 2019
  • September 2019
  • August 2019
  • July 2019
  • June 2019
  • May 2019
  • April 2019
  • March 2019
  • February 2019
  • January 2019

Categories

  • Business
  • Crypto
  • Economy
  • Finance Expert
  • Forex
  • Invest News
  • Investing
  • Tech
  • Trading
  • Uncategorized
  • Videos
Subscribe
Money Visa
Money Visa
  • Home
  • Crypto
  • Finance Expert
  • Business
  • Invest News
  • Investing
  • Trading
  • Forex
  • Videos
  • Economy
  • Tech
  • Contact
How GenAI-Powered Synthetic Data Is Reshaping Investment Workflows
  • Invest News

How GenAI-Powered Synthetic Data Is Reshaping Investment Workflows

  • July 31, 2025
  • Roubens Andy King
Total
0
Shares
0
0
0
Total
0
Shares
Share 0
Tweet 0
Pin it 0

In today’s data-driven investment environment, the quality, availability, and specificity of data can make or break a strategy. Yet investment professionals routinely face limitations: historical datasets may not capture emerging risks, alternative data is often incomplete or prohibitively expensive, and open-source models and datasets are skewed toward major markets and English-language content.

As firms seek more adaptable and forward-looking tools, synthetic data — particularly  when derived from generative AI (GenAI) — is emerging as a strategic asset, offering new ways to simulate market scenarios, train machine learning models, and backtest investing strategies. This post explores how GenAI-powered synthetic data is reshaping investment workflows — from simulating asset correlations to enhancing sentiment models — and what practitioners need to know to evaluate its utility and limitations.

What exactly is synthetic data, how is it generated by GenAI models, and why is it increasingly relevant for investment use cases?

Consider two common challenges. A portfolio manager looking to optimize performance across varying market regimes is constrained by historical data, which can’t account for “what-if” scenarios that have yet to occur. Similarly, a data scientist monitoring sentiment in German-language news for small-cap stocks may find that most available datasets are in English and focused on large-cap companies, limiting both coverage and relevance. In both cases, synthetic data offers a practical solution.

What Sets GenAI Synthetic Data Apart—and Why It Matters Now

Synthetic data refers to artificially generated datasets that replicate the statistical properties of real-world data. While the concept is not new — techniques like Monte Carlo simulation and bootstrapping have long supported financial analysis — what’s changed is the how.

GenAI refers to a class of deep-learning models capable of generating high-fidelity synthetic data across modalities such as text, tabular, image, and time-series. Unlike traditional methods, GenAI models learn complex real-world distributions directly from data, eliminating the need for rigid assumptions about the underlying generative process. This capability opens up powerful use cases in investment management, especially in areas where real data is scarce, complex, incomplete, or constrained by cost, language, or regulation.

Common GenAI Models

There are different types of GenAI models. Variational autoencoders (VAEs), generative adversarial networks (GANs), diffusion-based models, and large language models (LLMs) are the most common. Each model is built using neural network architectures, though they differ in their size and complexity. These methods have already demonstrated potential to enhance certain data-centric workflows within the industry. For example, VAEs have been used to create synthetic volatility surfaces to improve options trading (Bergeron et al., 2021). GANs have proven useful for portfolio optimization and risk management (Zhu, Mariani and Li, 2020; Cont et al., 2023). Diffusion-based models have proven useful for simulating asset return correlation matrices under various market regimes (Kubiak et al., 2024). And LLMs have proven useful for market simulations (Li et al., 2024).

Table 1.  Approaches to synthetic data generation.

Method Types of data it generates Example applications Generative?
Monte Carlo Time-series Portfolio optimization, risk management No
Copula-based functions Time-series, tabular Credit risk analysis, asset correlation modeling No
Autoregressive models Time-series Volatility forecasting, asset return simulation No
Bootstrapping Time-series, tabular, textual Creating confidence intervals, stress-testing No
Variational Autoencoders Tabular, time-series, audio, images Simulating volatility surfaces Yes
Generative Adversarial Networks Tabular, time-series, audio, images, Portfolio optimization, risk management, model training Yes
Diffusion models Tabular, time-series, audio, images, Correlation modelling, portfolio optimization Yes
Large language models Text, tabular, images, audio Sentiment analysis, market simulation Yes

Evaluating Synthetic Data Quality

Synthetic data should be realistic and match the statistical properties of your real data. Existing evaluation methods fall into two categories: quantitative and qualitative.

Qualitative approaches involve visualizing comparisons between real and synthetic datasets. Examples include visualizing distributions, comparing scatterplots between pairs of variables, time-series paths and correlation matrices. For example, a GAN model trained to simulate asset returns for estimating value-at-risk should successfully reproduce the heavy-tails of the distribution. A diffusion model trained to produce synthetic correlation matrices under different market regimes should adequately capture asset co-movements.

Quantitative approaches include statistical tests to compare distributions such as Kolmogorov-Smirnov, Population Stability Index and Jensen-Shannon divergence. These tests output statistics indicating the similarity between two distributions. For example, the Kolmogorov-Smirnov test outputs a p-value which, if lower than 0.05, suggests two distributions are significantly different. This can provide a more concrete measurement to the similarity between two distributions as opposed to visualizations.

Another approach involves “train-on-synthetic, test-on-real,” where a model is trained on synthetic data and tested on real data. The performance of this model can be compared to a model that is trained and tested on real data. If the synthetic data successfully replicates the properties of real data, the performance between the two models should be similar.

In Action: Enhancing Financial Sentiment Analysis with GenAI Synthetic Data

To put this into practice, I fine-tuned a small open-source LLM, Qwen3-0.6B, for financial sentiment analysis using a public dataset of finance-related headlines and social media content, known as FiQA-SA[1]. The dataset consists of 822 training examples, with most sentences classified as “Positive” or “Negative” sentiment.

I then used GPT-4o to generate 800 synthetic training examples. The synthetic dataset generated by GPT-4o was more diverse than the original training data, covering more companies and sentiment (Figure 1). Increasing the diversity of the training data provides the LLM with more examples from which to learn to identify sentiment from textual content, potentially improving model performance on unseen data.

Figure 1. Distribution of sentiment classes for both real (left), synthetic (right), and augmented training dataset (middle) consisting of real and synthetic data.

Table 2. Example sentences from the real and synthetic training datasets.

Sentence Class Data
Slump in Weir leads FTSE down from record high. Negative Real
AstraZeneca wins FDA approval for key new lung cancer pill. Positive Real
Shell and BG shareholders to vote on deal at end of January. Neutral Real
Tesla’s quarterly report shows an increase in vehicle deliveries by 15%. Positive Synthetic
PepsiCo is holding a press conference to address the recent product recall. Neutral Synthetic
Home Depot’s CEO steps down abruptly amidst internal controversies. Negative Synthetic

After fine-tuning a second model on a combination of real and synthetic data using the same training procedure, the F1-score increased by nearly 10 percentage points on the validation dataset (Table 3), with a final F1-score of 82.37% on the test dataset.

Table 3. Model performance on the FiQA-SA validation dataset.

Model Weighted F1-Score
Model 1 (Real) 75.29%
Model 2 (Real + Synthetic) 85.17%

I found that increasing the proportion of synthetic data too much had a negative impact. There is a Goldilocks zone between too much and too little synthetic data for optimum results.

Not a Silver Bullet, But a Valuable Tool

Synthetic data is not a replacement for real data, but it is worth experimenting with. Choose a method, evaluate synthetic data quality, and conduct A/B testing in a sandboxed environment where you compare workflows with and without different proportions of synthetic data. You might be surprised at the findings.

You can view all the code and datasets on the RPC Labs GitHub repository and take a deeper dive into the LLM case study in the Research and Policy Center’s “Synthetic Data in Investment Management” research report.


[1] The dataset is available for download here: https://huggingface.co/datasets/TheFinAI/fiqa-sentiment-classification

Total
0
Shares
Share 0
Tweet 0
Pin it 0
Roubens Andy King

Previous Article
Crypto Treasuries cross 0B, ETH Treasuries surpass B
  • Crypto

Crypto Treasuries cross $100B, ETH Treasuries surpass $4B

  • July 31, 2025
  • Roubens Andy King
Read More
Next Article
How to Invest in an Entire Market? | #FinShorts250
  • Videos

How to Invest in an Entire Market? | #FinShorts250

  • July 31, 2025
  • Roubens Andy King
Read More
You May Also Like
What Price Risk? Unpacking the Equity Risk Premium
Read More
  • Invest News

What Price Risk? Unpacking the Equity Risk Premium

  • Roubens Andy King
  • August 1, 2025
Trump Complete Victory Globally in Tariff War (except with One Country)
Read More
  • Invest News

Trump Complete Victory Globally in Tariff War (except with One Country)

  • Roubens Andy King
  • August 1, 2025
Top 3 ASX Nickel Stocks of 2025
Read More
  • Invest News

Top 3 ASX Nickel Stocks of 2025

  • Roubens Andy King
  • August 1, 2025
Congress Stock Trading Ban Bill Moves Ahead
Read More
  • Invest News

Congress Stock Trading Ban Bill Moves Ahead

  • Roubens Andy King
  • August 1, 2025
Financial Freedom in 6 Years by Buying Rentals with Just ,000 Down
Read More
  • Invest News

Financial Freedom in 6 Years by Buying Rentals with Just $6,000 Down

  • Roubens Andy King
  • August 1, 2025
Book Review: Trailblazers, Heroes, and Crooks
Read More
  • Invest News

Book Review: Trailblazers, Heroes, and Crooks

  • Roubens Andy King
  • August 1, 2025
How to Calculate Portfolio Beta to Better Manage Your Risk
Read More
  • Invest News

How to Calculate Portfolio Beta to Better Manage Your Risk

  • Roubens Andy King
  • August 1, 2025
QYLP ETF – Deep Dive
Read More
  • Invest News

QYLP ETF – Deep Dive

  • Roubens Andy King
  • August 1, 2025

Recent Posts

  • Closure Systems International introduces new child-resistant closure platform
  • What Price Risk? Unpacking the Equity Risk Premium
  • Dow, S&P 500, Nasdaq slump after weak jobs report, Trump’s tariff redux
  • Walgreens quietly makes a harsh store closure decision
  • Hong Kong’s stablecoin licensing regime takes effect, requiring issuers to get approval from the Monetary Authority, comply with AML protocols, and more (Callan Quinn/Decrypt)
Featured Posts
  • Closure Systems International introduces new child-resistant closure platform 1
    Closure Systems International introduces new child-resistant closure platform
    • August 1, 2025
  • What Price Risk? Unpacking the Equity Risk Premium 2
    What Price Risk? Unpacking the Equity Risk Premium
    • August 1, 2025
  • Dow, S&P 500, Nasdaq slump after weak jobs report, Trump’s tariff redux 3
    Dow, S&P 500, Nasdaq slump after weak jobs report, Trump’s tariff redux
    • August 1, 2025
  • Walgreens quietly makes a harsh store closure decision 4
    Walgreens quietly makes a harsh store closure decision
    • August 1, 2025
  • Hong Kong’s stablecoin licensing regime takes effect, requiring issuers to get approval from the Monetary Authority, comply with AML protocols, and more (Callan Quinn/Decrypt) 5
    Hong Kong’s stablecoin licensing regime takes effect, requiring issuers to get approval from the Monetary Authority, comply with AML protocols, and more (Callan Quinn/Decrypt)
    • August 1, 2025
Recent Posts
  • Crypto Disaster: Qubetics Token Crashes Nearly 100%—Possible Rug Pull
    Crypto Disaster: Qubetics Token Crashes Nearly 100%—Possible Rug Pull
    • August 1, 2025
  • Minebit Brings Instant Crypto Gaming and Massive Bonuses With WEB3 Simplicity
    Minebit Brings Instant Crypto Gaming and Massive Bonuses With WEB3 Simplicity
    • August 1, 2025
  • ‘This fire could have been prevented.’ How California utilities fought removal of old power lines
    ‘This fire could have been prevented.’ How California utilities fought removal of old power lines
    • August 1, 2025
Categories
  • Business (1,296)
  • Crypto (690)
  • Economy (104)
  • Finance Expert (1,151)
  • Forex (691)
  • Invest News (1,579)
  • Investing (879)
  • Tech (1,281)
  • Trading (1,265)
  • Uncategorized (1)
  • Videos (775)

Subscribe

Subscribe now to our newsletter

Money Visa
  • Privacy Policy
  • DMCA
  • Terms of Use
Money & Invest Advices

Input your search keywords and press Enter.