• playing with RSelenium to click over the abstract link to get the text.
  • Notebooks to play with Rselenium under ./notebooks/01-url_query folder
  • new vignette “thousands_to_dataframe” to bind multiple pages into one dataframe. Not all paper types are homogeneous and they return different columns. Until we find the algorithm, we will retrieve papers by a unique document type.
  • paper types do not return complete info in first query. It has to be specified in the type combobox. Problem moved to test_failing.Rmd
  • new vignette paper_types.Rmd to show how to obtain papers by document type. The document types that are homogeneous and return in the first query are: conference papers, jornal papers, general papers, presentation papers and chapter papers. Non-homogeneous: other and media types.


  • use tibbles to prevent long printing of dataframes
  • use as.tibble in functions onepetro_page_to_dataframe and summary_by_xxx
  • reordering chunks in README
  • improve explanation of what to do when we have more than 1000 papers to retrieve


  • use as.tibble to avoid long printing of dataframes
  • show first attempt of splitting paper pages in groups of 1000. Using example of “neural networks” which has 2661 conference-papers. For the time being using one type of papers because other document types have a different number of columns and causes a conflict with the dataframe binding. Working on it.
  • build site with pkgdown
  • add documentation for datasets
  • add tolerance to expect_equal because number of paper keeps growing
  • Added a NEWS.md file to track changes to the package.