Professors, Predictions, and the Rise of Robo-Analysts

AI vs. Human Analysts: A Financial Study

Recent findings reported by Morningstar indicate that Alejandro Lopez-Lira, a finance professor at the University of Florida, has been empirically testing the predictive capabilities of large language models, such as ChatGPT, DeepSeek, and Grok, in the domain of equity analysis. Preliminary results suggest that these AI systems exhibit a nontrivial capacity to replicate, and in some cases surpass, traditional analyst functions, raising the possibility of partial automation within the financial advisory sector.


Danelfin's AI-powered stock selection strategy has demonstrated impressive results, generating a return of +263% from January 2017 to August 2024, significantly outperforming the S&P 500's +189% during the same period. The platform's AI Score system shows that US-listed stocks with the highest scores (10/10) outperformed the market by an average of +14.69% (annualized alpha) over three months, while those with the lowest scores (1/10) underperformed by -37.38%.

Empirical evaluations of AI-driven equity selection have produced mixed yet encouraging outcomes. In one controlled trial, a portfolio comprising two AI-identified equities generated an average return of 10.74% over a 30-trading-day horizon, with one security outperforming the S&P 500 by nearly 5x. Separately, the AI platform AltIndex reports a historical hit rate of approximately 70%, with mean returns of 22% over a six-month holding period. Nevertheless, consistent with standard financial disclosure practices, these platforms caution that historical performance is not indicative of future results, reflecting the stochastic nature of capital markets even under advanced algorithmic methodologies.

Lopez-Lira's Research Methodology

Lopez-Lira's approach to testing AI stock-picking capabilities involves a rigorous methodology that has evolved over time. Initially, he conducted a simple experiment to determine if ChatGPT could accurately interpret whether news headlines were positive or negative for stocks, which yielded a remarkable 512% return. For real-money applications through the Autopilot investment app, he developed a more sophisticated process where AI models assign scores to companies on a scale of 1-100 based on comprehensive data including macroeconomic conditions, geopolitical risks, and company financials.

Lopez-Lira has progressively enhanced the methodological rigor of his experiments by relaxing constraints on the AI models, now employing OpenAI’s o3, xAI’s Grok 3, and DeepSeek R1 to construct portfolios comprising 15 positions with endogenously determined asset allocations and weightings. In joint research with collaborators from the Federal Reserve and the University of Cologne, he applied machine learning techniques to evaluate approximately 200 investment hypotheses. The analysis identified a novel predictor, a ratio involving sales from acquisitions and rental expenses, which demonstrated superior out-of-sample performance relative to conventional valuation metrics such as the book-to-market ratio. Specifically, this alternative measure generated post-2012 monthly excess returns of 1.03%, compared to sub-0.1% returns for traditional factors.

ChatGPT vs DeepSeek Trading Comparison

When comparing ChatGPT and DeepSeek for trading applications, each platform demonstrates distinct strengths. ChatGPT excels with complex trading instructions and has proven more effective at capturing economic news that links to market risk premium. In direct trading strategy tests, ChatGPT performed better with complex indicator challenges, successfully generating a profitable strategy with 514 trades and a 33% win rate.

While DeepSeek underperforms relative to ChatGPT in equity return forecasting—potentially attributable to the latter's more extensive training on English-language financial corpora; it exhibits comparative advantages in algorithmic strategy formulation and complex quantitative computation. This divergence manifests in market-level outcomes: information surfaced by DeepSeek appears to be rapidly priced in, yielding limited predictive value for future returns. 

By contrast, signals identified by ChatGPT are associated with both immediate and persistent return predictability, extending up to a six-month horizon. From a use-case perspective, ChatGPT demonstrates stronger performance in fundamental analysis and contextual interpretation of financial news, whereas DeepSeek is more effective in code generation and high-frequency data processing, favoring technical or algorithmic trading applications.

Related articles, journals and books:

(1) Morning Star Article

(2) SSRN Article

(3) AI Street Article

(4) A Book from Wiley Publishers

Popular Posts