Keywords: Database, financial analysts
Three databases
Compustat, CRSP, and IBES are the most widely used databases for research on financial analysts in the United States. This article addresses the problem of merging the three databases.
The firm identifiers in the three databases:
Identifier | CRSP | Compustat | IBES | Explanation |
---|---|---|---|---|
PERMNO | ✔ | CRSP permanent security identifier | ||
GVKEY | ✔ | Compustat company identifier | ||
IBES_TICKER | ✔ | IBES unique company ticker | ||
NCUSIP | ✔ (Historical;9-digit) | "Names" Cusip | ||
CUSIP | ✔ (Current;8-digit) | ✔ (Current;9-digit) | ✔ (Historical;8-digit) | Uniform securities identifier |
TICKER | ✔ (Historical) | ✔ (Current) | ✔ (Historical) | Exchange ticker |
As indicated by the explanations above, PERMNO
and CSUIP
are security ids, and GVKEY
identifies companies. Since a
company can have multiple securities and some private companies also disclose their accounting data, we need to merge public firms
with their ordinary common shares, and then keep companies with analyst coverage. Also, the change of the security TICKER
or CUSIP
often occurs
following a change in the company name or capital structure as a result of a spin-off or an M&A. An example is given at the end of this article.
To the extent that the empirical corporate finance research depends on firm fundamentals data such as firm size, our
data linkage is Compustat-centric. Therefore, to get the CCIM, we first left merge Compustat with CRSP and get CCM, followed by a merge of IBES data.
Attention:
- In this article, all the variables are on a quarterly basis. The following procedures are similar to those for annual measures.
- We focus on US public companies.
Preparation
Raw Data
- Compustat Fundamental Quarterly
- CRSP Monthly Stock File
- IBES Detail History
Environment
- Python 3.x
- pandas 2.0.1
- NumPy 1.24.2
- tqdm 4.65.0
Compustat-CRSP-Merged(CCM)
An easy way to get CCM is to query the WRDS CCM database. Without the CCM database, we can also get the CCM by manually merging Compustat and CRSP.
- Calculate quarterly market returns, quarterly stock returns, and trading volume using the monthly stock file and obtain the _CRSP Quarterly File_.
df_crsp = pd.read_csv("crsp_m.csv", thousands=r',')
df_crsp = df_crsp[["PERMNO","date","SHRCD","NCUSIP","PRC","VOL","RET","SHROUT","vwretd"]]
df_crsp.columns = ["permno","datadate","share_code","ncusip","prc_m","vol_m","ret_m",
"shrout","vwretd_m"]
df_crsp = df_crsp.dropna(subset=["ncusip", "datadate","prc_m", "ret_m"])
df_crsp = df_crsp[~pd.to_numeric(df_crsp.ret_m, errors='coerce').isnull()]
df_crsp = df_crsp.drop_duplicates(subset=["permno","datadate"],keep="last")
df_crsp[["prc_m","vol_m","ret_m","shrout","vwretd_m"]] = df_crsp[["prc_m","vol_m","ret_m","shrout","vwretd_m"]].apply(pd.to_numeric)
## Get quarterly mkt Returns
df_crsp["gross_vwretd_m"] = df_crsp["vwretd_m"] + 1
df_crsp["cul_vwretd_m"] = df_crsp.groupby(["permno"])["gross_vwretd_m"].cumprod()
df_crsp["l3_cul_vwretd_m"] = df_crsp.groupby(["permno"])["cul_vwretd_m"].shift(3)
df_crsp["vwretd_q"] = df_crsp.apply(lambda x: x.cul_vwretd_m/x.l3_cul_vwretd_m-1, axis=1)
## Get quarterly stock returns from monthly returns
df_crsp["l1_prc_q"] = df_crsp.groupby(["permno"])["prc_m"].shift(3)
df_crsp["gross_ret_m"] = df_crsp["ret_m"] + 1
df_crsp["cul_ret"] = df_crsp.groupby(["permno"])["gross_ret_m"].cumprod()
df_crsp["l3_cul_ret"] = df_crsp.groupby(["permno"])["cul_ret"].shift(3)
df_crsp["ret_q"] = df_crsp.apply(lambda x: x.cul_ret/x.l3_cul_ret-1, axis=1)
## Get quarterly stock trading volume from monthly volume
df_crsp["cul_vol"] = df_crsp.groupby(["permno"])["vol_m"].cumsum()
df_crsp["l3_cul_vol"] = df_crsp.groupby(["permno"])["cul_vol"].shift(3)
df_crsp["vol_q"] = df_crsp.apply(lambda x: x.cul_vol-x.l3_cul_vol, axis=1)
df_crsp["datadate"] = (pd.to_datetime(df_crsp["datadate"], format="%Y-%m-%d") + pd.tseries.offsets.MonthEnd(0)).astype(str)
df_crsp["datadate"] = df_crsp["datadate"].str.replace("-","")
df_crsp["l1_prc_m"] = df_crsp.groupby(["permno"])["prc_m"].shift(1)
df_crsp = df_crsp[["permno","ncusip","prc_m","datadate","l1_prc_m","l1_prc_q","ret_q","vol_q", \
"shrout","vwretd_q"]]
df_crsp.columns = ["permno","ncusip","prc_m","datadate","l1_prc_m","l1_prc_q","ret_q","vol_q","shrout","vwretd_q"]
- Convert {
"CUSIP"
:PERMNO
} into a dictionary and mapGVKEY
intoPERMNO
.
##Map the `GVKEY` into the `PERMNO`
df_crsp = pd.read_csv(r"crsp_q.csv",thousands=r',')
dict_cusip_permno = dict(zip(df_crsp.ncusip,df_crsp.permno))
## Read compustat
df_comp = pd.read_csv("compustat_q.csv")
df_comp = df_comp[["cusip","gvkey","cik","tic","fyearq","datafqtr","datacqtr", "datadate","rdq","fqtr","sic","atq","ltq","teqq","revtq","ibq","cogsq","intanq","invtq"]]
df_comp = df_comp.dropna(subset=["gvkey", "cusip", "datadate","atq"])
df_comp = df_comp.drop_duplicates(["gvkey","datafqtr"],keep = "last")
df_comp["l1_atq"] = df_comp.groupby("gvkey")["atq"].shift(1)
df_comp["cusip8"] = df_comp["cusip"].str.slice(stop=8)
df_comp["permno"] = df_comp["cusip8"].map(dict_cusip_permno)
- Left merge Compustat Fundamental Quarterly with CRSP Quarterly File on
PERMNO
andPFEDATS
.
## Get quarterly CCM
df_ccm = pd.merge(df_comp,df_crsp, on=["permno","datadate"],how="left")
df_ccm = df_ccm.dropna(subset=["prc_q","ret_q"])
df_ccm[["permno","cik","rdq","fqtr","sic","shrout"]] = df_ccm[["permno", "cik","rdq","fqtr","sic","shrout"]].apply(pd.to_numeric)
Attention:
- To merge stock returns and firm fundamentals by time, one way is to first convert the last trading date in a month to the month-end date and then merge with the Compustat on
datadate
. Another way is to generate the calendar year quarter for March, June, September, and December and then merge with the firm fundamental ondatacqtr
.
Compustat-CRSP-IBES Merged Data (CCIM)
IBES provides data on financial analysts' research outcomes, including earnings forecasts, price targets, and recommendations, etc. To merge with CCM, we focus on their quarterly earnings forecasts (EPS forecasts). Depending on the specific research questions, we may want the forecasts for individual analysts or the aggregated data at the company level. And for each type, we need to choose between the unadjusted raw data (e.g. _IBES Unadjusted Detail History_) and the adjusted one (e.g. _IBES Detail History_). IBES can adjust these EPS forecasts as well as the companies' actual results under the following conditions:
- Inclusion/exclusion of extraordinary non-recurring items in the actual earnings according to the inclusion of the majority of analysts;
- M&A or spin-offs;
- Stock split;
Here we take the IBES-adjusted summary statistic as an example. Similar to the previous section, we get the CCIM by mapping GVKEY
to IBES_TICKER
and then merging on the FPEDATE
(DATADATE
in Compustat).
- Clean up the IBES Summary Statisic dataset.
## Consensus from IBES summary
df_summary = pd.read_csv("summary_q.csv")
df_summary = df_summary[df_summary["FPI"]==6]
df_summary= df_summary.drop_duplicates(subset=["TICKER","FPEDATS"],keep="last")
df_summary = df_summary[["TICKER","CUSIP,"FPEDATS","MEDEST","NUMEST","MEANEST","STDEV"]]
df_summary.columns = ["ticker_ibes","cusip,"fpedate","medest","n_est","meanest","dispersion"]
- Convert {
"CUSIP"
:"IBES_TICKER"
} into a dictionary and mapGVKEY
intoIBES_TICKER
.
dict_cusip_ibes_ticker = dict(zip(df_summary.cusip,df_summary.ticker_ibes))
df_ccm = df_ccm.rename(columns={"datadate": "fpedate"})
df_ccm["ticker_ibes"] = df_ccm["cusip8"].map(dict_cusip_ibes_ticker)
- Left merge CCM with IBES Summary Statistic on
IBES_TICKER
andFPEDATE
.
## Get quarterly CCM
df_ccim = pd.merge(df_ccm,df_summary, on=["ticker_ibes","fpedate"], how="left")
df_ccim = df_ccim.dropna(subset=["prc_m","ret_q","atq","medest"])
df_ccim.reset_index(drop=True).to_csv("ccim.csv")