PubMed Unified REtrieval for Multi-Output Exploration. An R package that provides a single interface for accessing a range of NLM/PubMed databases, including:
PubMed abstract records,
iCite bibliometric data,
PubTator3 named entity annotations, and
full-text entries from PubMed Central (PMC).
This unified interface simplifies the data retrieval process, allowing users to interact with multiple PubMed services/APIs/output formats through a single R function.
The package also includes MeSH thesaurus resources as simple data frames, including Descriptor Terms, Descriptor Tree Structures, and Supplementary Concept Terms. Via the mesh-resources library.
Get the released version from CRAN:
install.packages('puremoe')Or the development version from GitHub with:
remotes::install_github("jaytimm/puremoe")The package has two basic functions: search_pubmed and
get_records. The former fetches PMIDs from the PubMed API
based on user search; the latter scrapes PMID records from a
user-specified PubMed endpoint – pubmed_abstracts,
pubmed_affiliations, pubtations,
icites, or pmc_fulltext.
Search syntax is the same as that implemented in standard PubMed search.
pmids <- puremoe::search_pubmed('("political ideology"[TiAb])',
use_pub_years = F)
# pmids <- puremoe::search_pubmed('immunity',
# use_pub_years = T,
# start_year = 2022,
# end_year = 2024) pubmed <- pmids |>
puremoe::get_records(endpoint = 'pubmed_abstracts',
cores = 3,
sleep = 1)
affiliations <- pmids |>
puremoe::get_records(endpoint = 'pubmed_affiliations',
cores = 1,
sleep = 0.5)
icites <- pmids |>
puremoe::get_records(endpoint = 'icites',
cores = 3,
sleep = 0.25)
pubtations <- pmids |>
puremoe::get_records(endpoint = 'pubtations',
cores = 2)
# For PMC full text, first convert PMIDs to PMC IDs
pmc_ids <- puremoe::pmid_to_pmc(pmids, sleep = 0.5)
pmc_fulltext <- pmc_ids$url[!is.na(pmc_ids$url)] |>
puremoe::get_records(endpoint = 'pmc_fulltext',
cores = 1,
sleep = 1)For the
pmc_fulltextendpoint, first usepmid_to_pmc()to convert PMIDs to PMC IDs and full URLs, then pass theurlcolumn toget_records().
| Output | Colname | Description |
|---|---|---|
| pubmed_abstracts | pmid | PMID |
| pubmed_abstracts | year | Publication year |
| pubmed_abstracts | journal | Journal name |
| pubmed_abstracts | articletitle | Article title |
| pubmed_abstracts | abstract | Article abstract |
| pubmed_abstracts | annotations | Mesh/Chem/Keywords annotations |
| pubmed_affiliations | pmid | PMID |
| pubmed_affiliations | Author | Author name |
| pubmed_affiliations | affiliation | Author affiliation |
| pubtations | pmid | PMID |
| pubtations | tiab | Title or abstract |
| pubtations | id | Entity ID |
| pubtations | entity | Extracted entity |
| pubtations | identifier | Knowledge base link (KB link) |
| pubtations | type | Entity type |
| pubtations | start | Start position (char) |
| pubtations | end | End position (char) |
| pmc_fulltext | pmid | PMID |
| pmc_fulltext | section | Full text section |
| pmc_fulltext | text | Full text content |
| icites | pmid | PMID |
| icites | is_research_article | Research article indicator |
| icites | nih_percentile | NIH percentile rank |
| icites | is_clinical | Clinical article indicator |
| icites | citation_count | Citation count |
| icites | ref_count | Reference count |
| icites | citation_net | Citation network (to/from edgelist) |