How the Kyber R package connects Google Sheets, RMarkdown, GitHub, and Agenda docs for open education
Our 8th Openscapes Community Call featured a “celebrity interview” with Dr. Sean Kross. Sean is a Data Staff Scientist at the Fred Hutch Data Science Lab. His work includes understanding data science as a practice, and approach combines computational, statistical, ethnographic, and design-driven methods. Sean earned his PhD in Human-Computer Interaction at UC San Diego where he was advised by Philip Guo, and interned at Microsoft Research. Before grad school Sean was the Chief Technology Officer at the Johns Hopkins Data Science Lab. He is a frequent consultant for data analysis and software development projects, and a maintainer of several open source software projects. Sean was interviewed by Stefanie Butland, MSc, Openscapes Team member.
Quicklinks
Sean joined us from his office-with-a-great-view at Fred Hutch Cancer Center in the South Lake Union area of Seattle.
What is Kyber?
Kyber is an R package that contains tools for setting up learning cohorts on GitHub, purpose-built for the Openscapes Champions Program. Kyber reads in data from Google Sheets and from that creates RMarkdown documents which are then exported to Google Docs. It also sets up repositories and files and organizes people on GitHub. Kyber replaces manual steps with R functions while maintaining the ability to edit outputs so we’re not constrained by the automation. How cool and practical is that?!
Watch the video (starting at 44:30) of Julie screensharing Openscapes’ call metadata and RMarkdown components that are the ingredients from which Kyber makes an agenda sandwich 🥪.
What was the motivation for Kyber?
Sean Kross assisted our 2019 Inaugural Openscapes Champions Cohort, learning from and supporting 7 academic marine science research teams. He was deeply involved with the inaugural Champions Cohort at Openscapes in 2019 and was privy to the manual creation of the infrastructure to make it happen. Sharing that curriculum with more groups would require repetition so:
Openscapes already makes heavy use of Google Drive and GitHub, how can we use R to bring both together?
How could we push RMarkdown’s parameterized reporting features?
How much automation can we drive with minimal programming for the end-user?
In Fall 2022, Openscapes ran 4 Cohorts, each with a separate GitHub repo with 40 team members and 40 uniquely-named Markdown files, plus 5 detailed Google Doc lesson agendas listing different start and end dates and times, discussion prompts, and links to slides. Using Kyber was a huge win for saving time and reducing manually-produced errors.
Beyond the nuts and bolts of tool development, we got into some more philosophical discussions. Discussions with our audiences are the best!
How do you manage your time?
As an early career professional with a full time job, consulting, being involved in data science communities, and keeping up with ever-evolving data science tools, how do you manage all of that? Sean feels that working in the open a lot - on GitHub, YouTube, websites - has magnified the perception of the work he does, based on others reusing and resharing. Philip Guo, Sean’s PhD advisor embraces the concept of “fire and motion”: it’s better to be moving forward, even if you’re just moving forward a little bit. If you have 20 minutes to write 3 sentences for that blog post that you’ve been wanting to write, do it, even if you don’t know if it will ever be published. After trying lots of tools, Sean uses Notion to write ideas and snippets of text that he can come back to later.
Watch the recording and browse the collaborative notes for Sean’s thoughts on his motivations for engaging in this kind of tool-development project, adapting the design and some of the functions in Kyber to automate other workflows, why use R for this application rather than using the coding capacity within Google products themselves, and how others can get into this kind of work.
Resources
- Subscribe to Monday Morning Data Science: fhdata.substack.com. Your weekly dose of data news from The Fred Hutch Data Science Lab
- R Package development (context: how can others get into this)
Package Development: the Mechanics and the sequel, rOpenSci: Package Development: Not Rocket Science by Maëlle Salmon for rOpenSci.
R-Packages book - Hadley Wickham
bookdown: Authoring Books and Technical Documents with R Markdown - Yihui Xie
- quartificate R package to transform Google Docs into Quarto Books, by Maëlle Salmon for rOpenSci
- Recent interview with Abby Cabunoc Mayes about open source philosophy: Abby’s now at GitHub and previously at Mozilla. She created Moz Open Leaders
- Jeff Leek has “the most forked GitHub repo” > 243,000 forks of How to share data with a statistician