Language: Python
I.What are SeekingAlpha CC and IBES, respectively?
SeekingAlpha is an online financial community for investors to share investing ideas and discuss inveting related news. It includes the posts of social media analysts and the transcripts of company conference calls (CC).
We are particularly interested in CC where firm managers and analysts are communicating after firm' earings announcements.
IBES (Institutional Brokers' Estimation System):
- Coverage: global listed firms;
- Format: archival data format;
- Measures: earnings forecasts, recommendations, target price...
- Contributors: sell side security analysts.
II. Why should we merge CC and IBES?
Since IBES only provides quantitative measures of analysts' research, we must merge IBES with CC in order to obtain textual data regarding the dialogues between management and analysts to gain insight into firm disclosures, analysts' involvement, and market reaction.
Essentially, we are mapping CC firm names and analyst names into company tickers and and analyst IDs in IBES. In addition, based on the timestramps in each database, we can also map one cc transcript to an unique EPS/Recommendation/Target Price revision , which is not discussed in this article.
III. How to merge CC and IBES?
Procedures conducted in the CC.
Procedures conducted in the IBES.
First, we need to match the firms in the CC universe with those in the IBES universe by firm ticker.
Then, we match the analysts names in the two datasets.
3.1 Preparation
- Download the Recommendation Detail File form IBES and save it
as _rec.csv_; - Download the CC transcripts from seekingalpha (tutorial coming soon);
Creat and clean the index file for the CC and save it into _dict_index.pkl_;
Note: The index file is saved as an pkl file, which is read as the python dictionary datatype. It's has the following format _{cc_id:[firm_name,ticker,fqtr,cc_time,managers,analysts]}_.
3.2 First Step: Match by firm ids
Convert the historical cusip into the current cusip to accomodate both Compustat and CC.
For this purpose, we backfill the cusip share the same ibes ticker (ticker not the oftic):
df_ibes["cusip8"] = df_ibes.groupby(["ticker"])["cusip"].tail(1) df_ibes["cusip8"] = df_ibes.groupby(["ticker"])["cusip8"].fillna(method='bfill')
- Keep "oftic-year-cusip" level non-duplicated data.
- From CC index file, we retrive the cc id, firm ticker, and year data.
IBES left joins CC on firm ticker and year.
Note: In SeekingAlpha, firms are labeled by historical tickers. Therefore, the "oftic" in IBES is the right key. We also merge by year since two firms may share the same (official) ticker.
3.3 Second Step: Match by analysts names
- Remove any blank spaces, special characters, and tiles in analyst names, drop duplicated analyst ids, drop analyst id equals "00000000" (see. Law WP2022), drop analyst name contains "RESEARCH" (research department), and count unique number of analyst ids per analyst name. We get a name-id linkage file. .
- Split the linkage data into two dataframes:
df_non_dup
anddf_dup.
df_non_dup
consists of analyst names with unique analyst id, whiledf_dup
consists of analyst names corresponding to multiple analyst ids. Then get a list of duplicated ids fromdf_dup
. - For analysts names in
df_dup
, generate a dictionarydict_dup
with firm id the key and (analyst name, analyst id) the value: {cusip: {lead_name:ana_id}}. . - Clean the analysts names from cc index files and get
df_cc_name.
- Merge the
df_cc_name
withdf_non_dup.
- For analyst names that are not found in
df_non_dup
, match within each firm coverage indict_dup.
- Append the two dataframes and get the final matched analyst_name-IBES id. .