Blog


The R analytics blog

This section, intended for the more technically minded readers, will deal with real data analysis problems and their solutions – mostly in R.



A simple static website generator with purrr


R Analytics Blog – 2017 / 07 / 27

Today, I want to present a simple way to use purrr to create a static website generator. Of course, there is Jekyll, Hugo and the blogdown package, but in many cases you may find yourself, like I did, with your own bits and pieces of html, no time to learn yet another language, and in need of a way to put all this into a structured and consistent static website.

The approach I took is

  • keep all the website elements in one large html file, the “master page” (a collection of files works too)
  • use html comments to define the start and end of the different sections, of which the different pages of the website will be composed
  • define all links using hrefs on #ids

and write a R /purrr-script to

  • manage the recombination of the sections onto different pages
  • replace all href-attributes with correct link addresses

To show how this works, assume our “master page” looks like this

<!-- START_HEADER --> 
<!DOCTYPE html>
<html><head>
<style type="text/css">

body, html {height: 100%;}
.A { border: 2px solid orange; height: 30%; width: 60%;}
.B { background-color: darkorange ; height: 80%; width: 60%;}
.C { border: 2px solid lightblue; height: 90%; width: 60%;}
.D { background-color: steelblue ; height: 70%; width: 60%;}

.navbar {width: 60%; background-color: grey; position: fixed; top: 0;}

a { display: inline-block; padding: 0.5rem;}
</style>
</head>
<!-- END_HEADER --> 
<!-- START_TOP --> 
<body>
<div class="navbar">
<a href="#HOME"> Home </a>
<a href="#BB"> Link to BB</a>
<a href="#CC"> Link to CC</a>
</div>
<br>
<br>
<h1> My website </h1>
<hr>
<!-- END_TOP --> 
<!-- START_AA --> 
<div class="A"> 
<h3> AA - Intro material</h3>
<a href="#DD"> LINK to DD </a> 
</div>
<!-- END_AA --> 
<!-- START_BB --> 
<div id="BB"> </div>
<br>
<br>
<div class="B"> 
<h3>Section BB</h3>
<a href="#CC"> LINK to CC </a> 
</div>
<!-- END_BB --> 
<!-- START_CC -->
<div id="CC"> </div>
<br>
<br>
<div class="C"> 
<h3>Section CC</h3>
<a href="#BB"> LINK to BB </a> 
</div>
<!-- END_CC --> 
<!-- START_DD -->
<div id="DD"> </div>
<br>
<br>
<div class = "D"> 
<h3>DD - Some other content </h3>
<a href="#BB"> LINK to BB </a> 
<a href="#CC"> LINK to CC </a> 
</div>
<!-- END_DD -->
</body> </html>

This html contains all the parts we need to piece together our website. All we do here is to mark each of our sections with html-comments à la <!-- START_DD -->. For simplicity, we also place and name our id-attributes accordingly. Also, usually, we would keep the CSS in a separate file.

Now, let’s say we want a front page with intro material A and one page each for the material in sections B and C+D.

We start by reading the file into a nested data frame, one nest for each of the sections we have defined

library(tidyverse)
library(stringr) 

PATHOUT = "./Site/"
fildat <- readLines("masterpage.html")

## marker for html blocks
markerdat <- tibble(
  markerline = fildat %>% str_which("<!-- START_|<!-- END_"),
  orig = fildat[markerline],
  markertype = str_extract(orig, "START|END"),
  markername = str_replace_all(orig, "<!--| |START_|END_|-->|\\t", ""))

markerall <- markerdat %>% select( -orig ) %>%
  spread(key = markertype, value = markerline) %>%
  mutate(origlines = map2( START, END, ~`[`(fildat, .x:.y))) %>%
  arrange(START)

markerall
## # A tibble: 6 x 4
##   markername   END START  origlines
##        <chr> <int> <int>     <list>
## 1     HEADER    17     1 <chr [17]>
## 2        TOP    29    18 <chr [12]>
## 3         AA    35    30  <chr [6]>
## 4         BB    44    36  <chr [9]>
## 5         CC    53    45  <chr [9]>
## 6         DD    63    54 <chr [10]>

We then define how our website should be built by declaring the order of the building blocks for each page.

pagedef <- list(
  main = tibble( 
     filename = "index.html",
     blocks = c("HEADER", 
		 "<style> .navbar { background-color: #AAAAAA; } </style>", 
		 "<div id = \"HOME\"> </div>",
		 "TOP", 
		 "AA", 
		 "<hr> <p> A few additional remarks </p> <hr>",
		 "</body> </html>") ),
  sectB = tibble( 
     filename = "sectB.html",
     blocks = c("HEADER", 
		 "<style> .navbar { background-color: #CCCCCC; } </style>", 
		 "TOP", 
		 "BB", 
		 "</body> </html>") ),
  sectCD = tibble( 
     filename = "sectCD.html",
     blocks = c("HEADER", 
		"<style> .navbar { background-color: #666666; } </style>", 
		"TOP", 
		"CC",
		"DD",
		"</body> </html>") ) ) %>% 
  bind_rows()

Identifying the corresponding lines of html for each block then really becomes no more than a join-operation:

pagecompile <- pagedef  %>% 
  left_join(markerall, by = c("blocks" = "markername")) %>%
  mutate(publishlines = 
           map2(blocks, origlines, ~if(is.null(.y)) {.x} else {.y})) %>% 
  select(filename, publishlines)%>% 
  unnest() %>%
  nest(-filename)

pagecompile
## # A tibble: 3 x 2
##      filename              data
##         <chr>            <list>
## 1  index.html <tibble [39 x 1]>
## 2  sectB.html <tibble [40 x 1]>
## 3 sectCD.html <tibble [50 x 1]>

This defines the three html files we want for our website. It remains to make sure that the original href-attributes are adapted to this structure. We start by collecting the info on all (types of) link sources and link targets in our three website files

linksrc <- pagecompile %>%
 mutate(href =  map(data, ~str_match(.$publishlines, 
                   pattern = "href\\s*=\\s*\"(#[A-Z0-9_]*)\"" )[,2]) %>%
          map(~unique(.[!is.na(.)]))) %>% 
  select(srcfile = filename, href) %>%
  unnest()

linktgt <- pagecompile %>%
  mutate(id =  map(data, ~str_match(.$publishlines, 
                   pattern = "id\\s*=\\s*\"([A-Z0-9_]*)\"" )[,2]) %>%
          map(~paste0("#", unique(.[!is.na(.)])))) %>% 
  select(tgtfile = filename, id) %>%
  unnest()

linktgt
## # A tibble: 4 x 2
##       tgtfile    id
##         <chr> <chr>
## 1  index.html #HOME
## 2  sectB.html   #BB
## 3 sectCD.html   #CC
## 4 sectCD.html   #DD

We would be in trouble if an id-attribute appeared more than once in the linktgt-table. This could happen if we wanted to reuse some of our html-blocks on several pages. In that case, we would have to introduce an additional rule, which of the copies should be considered as the true target for links.

For our little demonstration here, I have avoided this and other complexities. One additional rule, however, already needs taking care of in our example: On pages like sectB.html, whose content is exclusively from our block B, we would not want the link to jump to the id in the middle of the page. Rather, in that case, it seems more appropriate to link to the top of the page. Let’s have a look at the resulting link-structure:

link2top <- c("#BB", "#HOME")

linkInOut <- linksrc %>% 
  full_join(linktgt, by = c("href" = "id")) %>%
  mutate(finalLinkName = paste0(ifelse(srcfile == tgtfile, "", tgtfile), 
                                ifelse(href %in% link2top,"", href )))%>%
  select(srcfile, href, finalLinkName) 

linkInOut 
## # A tibble: 10 x 3
##        srcfile  href  finalLinkName
##          <chr> <chr>          <chr>
##  1  index.html #HOME               
##  2  index.html   #BB     sectB.html
##  3  index.html   #CC sectCD.html#CC
##  4  index.html   #DD sectCD.html#DD
##  5  sectB.html #HOME     index.html
##  6  sectB.html   #BB               
##  7  sectB.html   #CC sectCD.html#CC
##  8 sectCD.html #HOME     index.html
##  9 sectCD.html   #BB     sectB.html
## 10 sectCD.html   #CC            #CC

Now, all that remains to be done is to replace the href-attributes and to write out the resulting html-files. In a real life application you may have to work a little more to avoid mismatches. Also, other features, like automatically updated navigation menus when you add a blog entry etc., require a little extra. But the overall procedure seems sound and a version of this program produces the (small) website you are just looking at, in a matter of seconds.


## prepare replacement pattern
linkInOut <- linkInOut %>% 
  group_by(srcfile) %>% nest() %>%
  mutate(repPattern =  
      map(data, ~structure(paste0("href = \"",.$finalLinkName), 
                    .Names = paste0("href\\s*=\\s*\"",.$href)))) %>%
  select(-data)

## run replacements on data
pagecompile <- pagecompile %>% 
  left_join(linkInOut, by = c("filename" = "srcfile")) %>%
  mutate(dataTrans = map2(data, repPattern, 
                          ~str_replace_all(.x$publishlines, .y)))

## write out pages 
pagecompile <- pagecompile %>%
  mutate( fullname = paste0(PATHOUT, filename),
          written = map2(fullname, dataTrans, ~writeLines(.y, .x) ))


About me

I am a consultant and project manager in marketing and business analytics. Having worked in the area for more than 15 years and having led the Data Science and Analytics teams at IRI Germany from 2009 to 2016, I am now again working as an independent consultant focusing on applications of Big Data and AI in marketing.

Quantitative Consulting
www.quantitative-consulting.eu

Boris Vaillant - Quantitative Consulting 17

QC 17