The iDATA repository contains nearly 30 million world-wide news stories (English, Spanish and Portuguese) spanning from January 2001, published by over 6000 international, regional, national, and local news publishers. These unclassified stories are obtained through both Factiva as well as the government Open Source Center (OSC). These stories are processed through innovative deep (BBN Serif) and shallow-parsing (JabariNLP) technologies to produce a set of over 19 million unique geolocated events with an accuracy of greater than 80%. This event data consists of date-stamped and geolocated event triples that recount “who did-what to whom”.
There are over 300 different types of coded events drawn from the CAMEO (Conflict and Mediation Event Observations) taxonomy, with each event type having an observer-neutral intensity (Goldstein) score that represents how hostile or how cooperative the event is. The actors (country, sector, organization, individual) involved in events come from dictionaries of over 50,000 named and time-indexed entities as well as over 700 generic agents (e.g. police, government official, protestor).
The iDATA repository also contains data from some 30 different state data sources providing primarily quantitative data on over 175 different countries. Many countries do not have reliable published information (e.g., the GDP of Afghanistan) so iDATA uses a hybrid copula method to impute missing data records.
iDATA provides the underlying data that is leveraged by the iTRACE (trending) and iCAST (forecasting) components of the ICEWS system.
The iDATA repository is one of the largest human socio-cultural data sets that has unified data that can be exploited by both operators and social science modelers.
Key Features and Benefits
- 12 year unified set of human social cultural data
- Event data describing who did what to whom, when, and where
- State data set of 30 sources unified for analysis and processing.
For more information, please contact us at firstname.lastname@example.org