---
title: "The pkgmatch package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The pkgmatch package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set (
    collapse = TRUE,
    comment = "#>"
)
```

The "pkgmatch" package is a search and matching engine for R packages. It finds
the best-matching R packages to an input of either a text description, or a
local path to an R package. `pkgmatch` was developed to enable rOpenSci to
identify similar packages to each new package submitted for [our software
peer-review scheme](https://ropensci.org/software-review/). Matching packages
can be found either in [rOpenSci's own package
suite](https://ropensci.org/packages/), or all [packages currently on
CRAN](https://cran.r-project.org).

## What does the package do?

What the package does is best understood by example, starting with loading the package.

```{r library}
library (pkgmatch)
```

Then match packages to an input string:

```{r match-text-1-fakey, eval = FALSE}
input <- "genomics and transcriptomics sequence location data"
pkgmatch_similar_pkgs (input, corpus = "ropensci")
```

```{r redef-sim-pkgs1, eval = TRUE, echo = FALSE}
c ("biomartr", "traits", "phylotaR", "phruta", "rebird")
```

By default, the top five matching packages are printed to the screen. The
function actually returns information on all packages, along with a `head`
method to display the first few rows:

```{r match-text-1-fakey-return, eval = FALSE}
p <- pkgmatch_similar_pkgs (input, corpus = "ropensci")
head (p)
```
```{r match-text-1-return, eval = TRUE, echo = FALSE}
data.frame (
    package = c ("biomartr", "traits", "phylotaR", "phruta", "rebird"),
    rank = 1:5
)
```

The `head` method also accepts an `n` parameter to control how many rows are
displayed, or `as.data.frame` can be used to see the entire `data.frame` of
results.

The following lines find equivalent matches against all packages currently on
CRAN:

```{r match-text-2-cran-fakey, eval = FALSE}
pkgmatch_similar_pkgs (input, corpus = "cran")
```

```{r redef-sim-pkgs2, eval = TRUE, echo = FALSE}
c ("omicsTools", "ggalign", "omixVizR", "singleCellHaystack", "spatialGE")
```

### Using an R package as input

The package also accepts as input a path to a local R package. The following
code downloads a "tarball" (`.tar.gz` file) from CRAN and finds matching
packages from that corpus. We of course expect the best matches against CRAN
packages to include that package itself:

```{r odbc-cran-match-fakey, eval = FALSE}
u <- "https://cran.r-project.org/src/contrib/Archive/odbc/odbc_1.5.0.tar.gz"
destfile <- file.path (tempdir (), basename (u))
download.file (u, destfile = destfile, quiet = TRUE)
pkgmatch_similar_pkgs (destfile, corpus = "cran")
```

```{r odbc-cran-match, echo = FALSE, eval = TRUE}
c ("odbc", "RODBC", "DatabaseConnector", "dbplyr", "reticulate")
```

which they indeed do. As explained in the documentation, the
`pkgmatch_similar_pkgs()` function ranks final results from
[document token-frequency
analyses](https://en.wikipedia.org/wiki/Okapi_BM25). The rankings from each of
these components can be seen as above with the `head` method:

```{r odbc-match-head-fakey, eval = FALSE}
p <- pkgmatch_similar_pkgs (destfile, corpus = "cran")
head (p)
```
```{r odbc-cran-match-head, echo = FALSE, eval = TRUE}
data.frame (
    package = c ("odbc", "RODBC", "DatabaseConnector", "dbplyr", "reticulate"),
    version = c ("1.6.4.1", "1.3-26.1", "7.1.0", "2.5.2", "1.45.0"),
    rank = 1:5
)
```
