Financial News Corpus
About this Dataset
One million financial news articles sourced from 47 publishers covering global equity, commodity, crypto, and macroeconomic events from January 2020 through December 2024. Each article is tagged with named entities (companies, people, instruments), an event type label, and a market-impact score derived from same-day price movement of mentioned tickers. Content from paywalled sources was legally licensed. The dataset is partitioned by year and asset class to facilitate temporal backtesting.
Validation Report
Quality Analysis
Issues
- !6% of articles contain paraphrased paywalled content
- !Entity tagging inconsistency observed across two source publications
Strengths
- 47-source coverage spanning global financial press
- 5-year temporal span for robust backtesting
- Market-impact correlation annotations
- Event-type labels across 12 financial event categories
Category & Use Cases
Recommended Use Cases
Originality Check
Compiled from licensed proprietary sources with a unique market-impact labeling methodology not present in public financial NLP datasets.
Access Price
34
Total Purchases
Apr 22, 2026
Listed