Keywords: Database, financial analysts

Three databases

Compustat, CRSP, and IBES are the most widely used databases for research on financial analysts in the United States. This article addresses the problem of merging the three databases.

The firm identifiers in the three databases:

IdentifierCRSPCompustatIBESExplanation
PERMNO CRSP permanent security identifier
GVKEY Compustat company identifier
IBES_TICKER IBES unique company ticker
NCUSIP
(Historical;9-digit)
"Names" Cusip
CUSIP
(Current;8-digit)

(Current;9-digit)

(Historical;8-digit)
Uniform securities identifier
TICKER
(Historical)

(Current)

(Historical)
Exchange ticker

As indicated by the explanations above, PERMNO and CSUIP are security ids, and GVKEY identifies companies. Since a
company can have multiple securities and some private companies also disclose their accounting data, we need to merge public firms
with their ordinary common shares, and then keep companies with analyst coverage. Also, the change of the security TICKER or CUSIP often occurs
following a change in the company name or capital structure as a result of a spin-off or an M&A. An example is given at the end of this article.

To the extent that the empirical corporate finance research depends on firm fundamentals data such as firm size, our
data linkage is Compustat-centric. Therefore, to get the CCIM, we first left merge Compustat with CRSP and get CCM, followed by a merge of IBES data.

Attention:

  1. In this article, all the variables are on a quarterly basis. The following procedures are similar to those for annual measures.
  2. We focus on US public companies.

Preparation

Raw Data

Environment

Compustat-CRSP-Merged(CCM)

An easy way to get CCM is to query the WRDS CCM database. Without the CCM database, we can also get the CCM by manually merging Compustat and CRSP.

  1. Calculate quarterly market returns, quarterly stock returns, and trading volume using the monthly stock file and obtain the _CRSP Quarterly File_.
df_crsp = pd.read_csv("crsp_m.csv", thousands=r',')
df_crsp = df_crsp[["PERMNO","date","SHRCD","NCUSIP","PRC","VOL","RET","SHROUT","vwretd"]]
df_crsp.columns = ["permno","datadate","share_code","ncusip","prc_m","vol_m","ret_m",
"shrout","vwretd_m"]
df_crsp = df_crsp.dropna(subset=["ncusip", "datadate","prc_m", "ret_m"])
df_crsp = df_crsp[~pd.to_numeric(df_crsp.ret_m, errors='coerce').isnull()]
df_crsp = df_crsp.drop_duplicates(subset=["permno","datadate"],keep="last")
df_crsp[["prc_m","vol_m","ret_m","shrout","vwretd_m"]] = df_crsp[["prc_m","vol_m","ret_m","shrout","vwretd_m"]].apply(pd.to_numeric)

## Get quarterly mkt Returns
df_crsp["gross_vwretd_m"] = df_crsp["vwretd_m"] + 1
df_crsp["cul_vwretd_m"] = df_crsp.groupby(["permno"])["gross_vwretd_m"].cumprod()
df_crsp["l3_cul_vwretd_m"] = df_crsp.groupby(["permno"])["cul_vwretd_m"].shift(3)
df_crsp["vwretd_q"] = df_crsp.apply(lambda x: x.cul_vwretd_m/x.l3_cul_vwretd_m-1, axis=1)

## Get quarterly stock returns from monthly returns
df_crsp["l1_prc_q"] = df_crsp.groupby(["permno"])["prc_m"].shift(3)
df_crsp["gross_ret_m"] = df_crsp["ret_m"] + 1
df_crsp["cul_ret"] = df_crsp.groupby(["permno"])["gross_ret_m"].cumprod()
df_crsp["l3_cul_ret"] = df_crsp.groupby(["permno"])["cul_ret"].shift(3)
df_crsp["ret_q"] = df_crsp.apply(lambda x: x.cul_ret/x.l3_cul_ret-1, axis=1)
    
## Get quarterly stock trading volume from monthly volume
df_crsp["cul_vol"] = df_crsp.groupby(["permno"])["vol_m"].cumsum()
df_crsp["l3_cul_vol"] = df_crsp.groupby(["permno"])["cul_vol"].shift(3)
df_crsp["vol_q"] = df_crsp.apply(lambda x: x.cul_vol-x.l3_cul_vol, axis=1)
df_crsp["datadate"] = (pd.to_datetime(df_crsp["datadate"], format="%Y-%m-%d") + pd.tseries.offsets.MonthEnd(0)).astype(str)
df_crsp["datadate"] = df_crsp["datadate"].str.replace("-","") 
df_crsp["l1_prc_m"] = df_crsp.groupby(["permno"])["prc_m"].shift(1)
df_crsp = df_crsp[["permno","ncusip","prc_m","datadate","l1_prc_m","l1_prc_q","ret_q","vol_q", \
"shrout","vwretd_q"]]
df_crsp.columns = ["permno","ncusip","prc_m","datadate","l1_prc_m","l1_prc_q","ret_q","vol_q","shrout","vwretd_q"]
  1. Convert {"CUSIP": PERMNO} into a dictionary and map GVKEY into PERMNO.
##Map the `GVKEY` into the `PERMNO`
df_crsp = pd.read_csv(r"crsp_q.csv",thousands=r',')
dict_cusip_permno = dict(zip(df_crsp.ncusip,df_crsp.permno))

## Read compustat
df_comp = pd.read_csv("compustat_q.csv")
df_comp = df_comp[["cusip","gvkey","cik","tic","fyearq","datafqtr","datacqtr", "datadate","rdq","fqtr","sic","atq","ltq","teqq","revtq","ibq","cogsq","intanq","invtq"]]
df_comp = df_comp.dropna(subset=["gvkey", "cusip", "datadate","atq"])
df_comp = df_comp.drop_duplicates(["gvkey","datafqtr"],keep = "last")
df_comp["l1_atq"] = df_comp.groupby("gvkey")["atq"].shift(1)
df_comp["cusip8"] = df_comp["cusip"].str.slice(stop=8)
df_comp["permno"] = df_comp["cusip8"].map(dict_cusip_permno)
  1. Left merge Compustat Fundamental Quarterly with CRSP Quarterly File on PERMNO and PFEDATS.
## Get quarterly CCM
df_ccm = pd.merge(df_comp,df_crsp, on=["permno","datadate"],how="left")
df_ccm = df_ccm.dropna(subset=["prc_q","ret_q"])
df_ccm[["permno","cik","rdq","fqtr","sic","shrout"]] = df_ccm[["permno", "cik","rdq","fqtr","sic","shrout"]].apply(pd.to_numeric)

Attention:

  1. To merge stock returns and firm fundamentals by time, one way is to first convert the last trading date in a month to the month-end date and then merge with the Compustat on datadate. Another way is to generate the calendar year quarter for March, June, September, and December and then merge with the firm fundamental on datacqtr.

Compustat-CRSP-IBES Merged Data (CCIM)

IBES provides data on financial analysts' research outcomes, including earnings forecasts, price targets, and recommendations, etc. To merge with CCM, we focus on their quarterly earnings forecasts (EPS forecasts). Depending on the specific research questions, we may want the forecasts for individual analysts or the aggregated data at the company level. And for each type, we need to choose between the unadjusted raw data (e.g. _IBES Unadjusted Detail History_) and the adjusted one (e.g. _IBES Detail History_). IBES can adjust these EPS forecasts as well as the companies' actual results under the following conditions:

Here we take the IBES-adjusted summary statistic as an example. Similar to the previous section, we get the CCIM by mapping GVKEY to IBES_TICKER and then merging on the FPEDATE (DATADATE in Compustat).

  1. Clean up the IBES Summary Statisic dataset.
## Consensus from IBES summary 
df_summary = pd.read_csv("summary_q.csv")
df_summary = df_summary[df_summary["FPI"]==6]
df_summary= df_summary.drop_duplicates(subset=["TICKER","FPEDATS"],keep="last")
df_summary = df_summary[["TICKER","CUSIP,"FPEDATS","MEDEST","NUMEST","MEANEST","STDEV"]]
df_summary.columns = ["ticker_ibes","cusip,"fpedate","medest","n_est","meanest","dispersion"]
  1. Convert {"CUSIP": "IBES_TICKER"} into a dictionary and map GVKEY into IBES_TICKER.
dict_cusip_ibes_ticker = dict(zip(df_summary.cusip,df_summary.ticker_ibes))
df_ccm = df_ccm.rename(columns={"datadate": "fpedate"})
df_ccm["ticker_ibes"] = df_ccm["cusip8"].map(dict_cusip_ibes_ticker)
  1. Left merge CCM with IBES Summary Statistic on IBES_TICKER and FPEDATE.
## Get quarterly CCM
df_ccim = pd.merge(df_ccm,df_summary, on=["ticker_ibes","fpedate"], how="left")
df_ccim = df_ccim.dropna(subset=["prc_m","ret_q","atq","medest"])
df_ccim.reset_index(drop=True).to_csv("ccim.csv")

An example of id changes after spin-offs

ID Change after Spin-off An example.png