Data Sources

For a majority of these posts the data is scraped from the web and manipulated/cleaned before being used in its final form. Final race results, aid station or lap level race results, and weather data can all be combined and used in analysis and visualizations. Often times these data sources are taken from separate locations and merged together.

The R package I use for web-scraping is Rvest. I use the R tidyverse for data manipulation and cleaning (and for lots of analysis).

Most of the the time the combining, manipulating, and cleaning of data can be messy and time consuming. Once the merged data is built the fun of digging into a data set can begin.

In an effort to show the data underlying my analysis I will be placing my cleaned data in a github repository. I will include the R code used in gathering and cleaning the data as well as the data in its post processed form. If you so desire you can download the data to dig into on your own or run the R scrips to scrape the data from the original sources. I intend to provide comments on where the original data comes from and the steps taken to gather/clean/post-process the data.