Extract miRNA names from abstracts in a data frame.

extract_mir_df(
  df,
  threshold = 1,
  col.abstract = Abstract,
  extract_letters = FALSE
)

Arguments

df

Data frame containing abstracts.

threshold

Integer. Specifies how often a miRNA must be mentioned in an abstract to be extracted.

col.abstract

Symbol. Column containing abstracts.

extract_letters

Boolean. If extract_letters = FALSE, only the miRNA stem is extracted (e.g. miR-23). If extract_letters = TRUE, the miRNA stem with trailing letter (e.g. miR-23a) is extracted.

Value

Data frame with miRNA names extracted from abstracts.

Details

Extract miRNA names from abstracts in a data frame. miRNA names can either be extracted with their stem only, e.g. miR-23, or with their trailing letter, e.g. miR-23a. miRNA names are adapted to the most recent miRBase version (e.g. miR-97, miR-102, miR-180(a/b) become miR-30a, miR-29a, and miR-172(a/b), respectively). Additionally, how often a miRNA must be mentioned in an abstract to be extracted can be regulated via the threshold argument. Ultimately, abstracts not containing any miRNA names are silently dropped. As many abstracts do not adhere to the miRNA nomenclature, it is recommended to extract only the miRNA stem with extract_letters = FALSE.

See also