purrr
Today, I want to present a simple way to use purrr
to create a static website generator. Of course, there is Jekyll, Hugo and the blogdown
package, but in many cases you may find yourself, like I did, with your own bits and pieces of html, no time to learn yet another language, and in need of a way to put all this into a structured and consistent static website.
The approach I took is
and write a R
/purrr
-script to
To show how this works, assume our “master page” looks like this
<!-- START_HEADER -->
<!DOCTYPE html>
<html><head>
<style type="text/css">
body, html {height: 100%;}
.A { border: 2px solid orange; height: 30%; width: 60%;}
.B { background-color: darkorange ; height: 80%; width: 60%;}
.C { border: 2px solid lightblue; height: 90%; width: 60%;}
.D { background-color: steelblue ; height: 70%; width: 60%;}
.navbar {width: 60%; background-color: grey; position: fixed; top: 0;}
a { display: inline-block; padding: 0.5rem;}
</style>
</head>
<!-- END_HEADER -->
<!-- START_TOP -->
<body>
<div class="navbar">
<a href="#HOME"> Home </a>
<a href="#BB"> Link to BB</a>
<a href="#CC"> Link to CC</a>
</div>
<br>
<br>
<h1> My website </h1>
<hr>
<!-- END_TOP -->
<!-- START_AA -->
<div class="A">
<h3> AA - Intro material</h3>
<a href="#DD"> LINK to DD </a>
</div>
<!-- END_AA -->
<!-- START_BB -->
<div id="BB"> </div>
<br>
<br>
<div class="B">
<h3>Section BB</h3>
<a href="#CC"> LINK to CC </a>
</div>
<!-- END_BB -->
<!-- START_CC -->
<div id="CC"> </div>
<br>
<br>
<div class="C">
<h3>Section CC</h3>
<a href="#BB"> LINK to BB </a>
</div>
<!-- END_CC -->
<!-- START_DD -->
<div id="DD"> </div>
<br>
<br>
<div class = "D">
<h3>DD - Some other content </h3>
<a href="#BB"> LINK to BB </a>
<a href="#CC"> LINK to CC </a>
</div>
<!-- END_DD -->
</body> </html>
This html contains all the parts we need to piece together our website. All we do here is to mark each of our sections with html-comments à la <!-- START_DD -->
. For simplicity, we also place and name our id-attributes accordingly. Also, usually, we would keep the CSS in a separate file.
Now, let’s say we want a front page with intro material A and one page each for the material in sections B and C+D.
We start by reading the file into a nested data frame, one nest for each of the sections we have defined
library(tidyverse)
library(stringr)
PATHOUT = "./Site/"
fildat <- readLines("masterpage.html")
## marker for html blocks
markerdat <- tibble(
markerline = fildat %>% str_which("<!-- START_|<!-- END_"),
orig = fildat[markerline],
markertype = str_extract(orig, "START|END"),
markername = str_replace_all(orig, "<!--| |START_|END_|-->|\\t", ""))
markerall <- markerdat %>% select( -orig ) %>%
spread(key = markertype, value = markerline) %>%
mutate(origlines = map2( START, END, ~`[`(fildat, .x:.y))) %>%
arrange(START)
markerall
## # A tibble: 6 x 4
## markername END START origlines
## <chr> <int> <int> <list>
## 1 HEADER 17 1 <chr [17]>
## 2 TOP 29 18 <chr [12]>
## 3 AA 35 30 <chr [6]>
## 4 BB 44 36 <chr [9]>
## 5 CC 53 45 <chr [9]>
## 6 DD 63 54 <chr [10]>
We then define how our website should be built by declaring the order of the building blocks for each page.
pagedef <- list(
main = tibble(
filename = "index.html",
blocks = c("HEADER",
"<style> .navbar { background-color: #AAAAAA; } </style>",
"<div id = \"HOME\"> </div>",
"TOP",
"AA",
"<hr> <p> A few additional remarks </p> <hr>",
"</body> </html>") ),
sectB = tibble(
filename = "sectB.html",
blocks = c("HEADER",
"<style> .navbar { background-color: #CCCCCC; } </style>",
"TOP",
"BB",
"</body> </html>") ),
sectCD = tibble(
filename = "sectCD.html",
blocks = c("HEADER",
"<style> .navbar { background-color: #666666; } </style>",
"TOP",
"CC",
"DD",
"</body> </html>") ) ) %>%
bind_rows()
Identifying the corresponding lines of html for each block then really becomes no more than a join-operation:
pagecompile <- pagedef %>%
left_join(markerall, by = c("blocks" = "markername")) %>%
mutate(publishlines =
map2(blocks, origlines, ~if(is.null(.y)) {.x} else {.y})) %>%
select(filename, publishlines)%>%
unnest() %>%
nest(-filename)
pagecompile
## # A tibble: 3 x 2
## filename data
## <chr> <list>
## 1 index.html <tibble [39 x 1]>
## 2 sectB.html <tibble [40 x 1]>
## 3 sectCD.html <tibble [50 x 1]>
This defines the three html files we want for our website. It remains to make sure that the original href-attributes are adapted to this structure. We start by collecting the info on all (types of) link sources and link targets in our three website files
linksrc <- pagecompile %>%
mutate(href = map(data, ~str_match(.$publishlines,
pattern = "href\\s*=\\s*\"(#[A-Z0-9_]*)\"" )[,2]) %>%
map(~unique(.[!is.na(.)]))) %>%
select(srcfile = filename, href) %>%
unnest()
linktgt <- pagecompile %>%
mutate(id = map(data, ~str_match(.$publishlines,
pattern = "id\\s*=\\s*\"([A-Z0-9_]*)\"" )[,2]) %>%
map(~paste0("#", unique(.[!is.na(.)])))) %>%
select(tgtfile = filename, id) %>%
unnest()
linktgt
## # A tibble: 4 x 2
## tgtfile id
## <chr> <chr>
## 1 index.html #HOME
## 2 sectB.html #BB
## 3 sectCD.html #CC
## 4 sectCD.html #DD
We would be in trouble if an id
-attribute appeared more than once in the linktgt
-table. This could happen if we wanted to reuse some of our html-blocks on several pages. In that case, we would have to introduce an additional rule, which of the copies should be considered as the true target for links.
For our little demonstration here, I have avoided this and other complexities. One additional rule, however, already needs taking care of in our example: On pages like sectB.html
, whose content is exclusively from our block B, we would not want the link to jump to the id in the middle of the page. Rather, in that case, it seems more appropriate to link to the top of the page. Let’s have a look at the resulting link-structure:
link2top <- c("#BB", "#HOME")
linkInOut <- linksrc %>%
full_join(linktgt, by = c("href" = "id")) %>%
mutate(finalLinkName = paste0(ifelse(srcfile == tgtfile, "", tgtfile),
ifelse(href %in% link2top,"", href )))%>%
select(srcfile, href, finalLinkName)
linkInOut
## # A tibble: 10 x 3
## srcfile href finalLinkName
## <chr> <chr> <chr>
## 1 index.html #HOME
## 2 index.html #BB sectB.html
## 3 index.html #CC sectCD.html#CC
## 4 index.html #DD sectCD.html#DD
## 5 sectB.html #HOME index.html
## 6 sectB.html #BB
## 7 sectB.html #CC sectCD.html#CC
## 8 sectCD.html #HOME index.html
## 9 sectCD.html #BB sectB.html
## 10 sectCD.html #CC #CC
Now, all that remains to be done is to replace the href-attributes and to write out the resulting html-files. In a real life application you may have to work a little more to avoid mismatches. Also, other features, like automatically updated navigation menus when you add a blog entry etc., require a little extra. But the overall procedure seems sound and a version of this program produces the (small) website you are just looking at, in a matter of seconds.
## prepare replacement pattern
linkInOut <- linkInOut %>%
group_by(srcfile) %>% nest() %>%
mutate(repPattern =
map(data, ~structure(paste0("href = \"",.$finalLinkName),
.Names = paste0("href\\s*=\\s*\"",.$href)))) %>%
select(-data)
## run replacements on data
pagecompile <- pagecompile %>%
left_join(linkInOut, by = c("filename" = "srcfile")) %>%
mutate(dataTrans = map2(data, repPattern,
~str_replace_all(.x$publishlines, .y)))
## write out pages
pagecompile <- pagecompile %>%
mutate( fullname = paste0(PATHOUT, filename),
written = map2(fullname, dataTrans, ~writeLines(.y, .x) ))
Boris Vaillant - Quantitative Consulting 17
QC 17