Like many folks, our ability to host in-person training and research projects evaporated for summer 2020. Unlike many folks, our bioinformatics research is entirely computational and can be conducted from anywhere with a terminal interface and reasonable internet. With support from Titus and the lab, new labbie Hannah (PhD student in Microbiology) and I set out developing an online research program for students as part of the National Summer Undergraduate Research Project (NSURP) program. NSURP emerged as a response to COVID restrictions and pervasive white supremacy in academia, and is aimed at providing rewarding summer research opportunities for BIPOC undergraduate students in the microbial sciences. These are goals near and dear to our hearts, and we felt we could provide solid training and research experience through this program.
tl;dr
Remote bioinformatics research training is highly feasible over Zoom! In our experience, it required significant time investment (especially for computational training up front), but that investment paid dividends later.
dib-lab NSURP project; doi: 10.5281/zenodo.4041204
The Project
In the dib-lab, we do a mix of bioinformatic tool development and biology-focused research using those tools. Our NSURP students were interested in human health, and we have several projects surrounding a reanalysis of microbial data from patients with Irritable Bowel Disease (IBD), spearheaded by graduate student Taylor Reiter. Like many wet lab projects, there is significant training that needs to be done before students have the background to do this sort of work - it would not be productive for any of us to throw our students in the deep end and expect them to be able to immediately generate novel research results or even curate data or metadata for us.
We started with a technical bootcamp: over nearly two weeks, we helped them install terminal apps (windows), learn basic unix commands, log in to a remote HPC, keep a detailed computational notebook using HackMD, and run through a tutorial of our sourmash k-mer analysis software. We included background biology and ran through some of the biological basis for how k-mer analysis can be used for taxonomic identification of metagenome samples. We began with daily 1-hour zoom meetings, and then switched to twice-weekly 2-hour meetings for the project portion, below.
To enable research bioinformatics training, we built a mini research project with a very small subset of the IBD data - not enough for novel research results, but plenty for understanding the challenges involved in running this sort of analysis, and learning to troubleshoot your way through. Over the course of the project, the students practiced navigating the command line, managing software installations using conda environments, executing bioinformatic tools, and interpreting the results of each analsyis step. After getting through the entire workflow, we provided additional samples that would enable a more robust analysis, and the students ran through the entire workflow again on their own.
After executing all these steps manually (keeping and using detailed computational notebooks), the benefit of computational workflows became abundantly clear. We introduced automation with bash and snakemake, and worked through building and running a Snakefile for each step using the commands they’d previously worked through. Finally, we introduced version control with git and github, and encouraged the students to upload all project files and notes to create a well-documented, repeatable workflow.
By the end of the project, both students were navigating the commandline confidently and were able to complete the experimental challenge exercises independently! We were impressed with the amount of coding and troubleshooting techniques they picked up over such a short period of time.
What worked well?
We live-coded all tutorials and used Zoom screensharing to help troubleshoot any issues that cropped up during the modules. We developed Carpentries-style learning modules that included all code and information we’d be using, and then ran through them together, with additional coding and biological explanations along the way. The modules and data seemed to work well!
We were available on Slack for questions and encouraged our students to work together to complete the challenge exercises. We reviewed their computational notebooks and gave feedback for improvements along the way.
What could have gone better?
Project Timing and Organization: Setting up HPC accounts took a bit longer than we thought, and the students would have appreciated get going on actual data analysis sooner.
Enabling Independent Research is difficult: Even though we posted all training modules on our training website, we ran through these modules together (over Zoom) as a carpentries-style lesson, rather than having the students run through them first. Partially, this is to prevent speeding through (as it’s all copy-pasta!), which often leads to accidentally missing important steps, and an incomplete understanding the importance of each step. Going through code together enables us to build understanding of what each piece of code does, rather than just ensuring we produce output files. The downside of this is that for the first half of the project, students can only make progress during our Zoom meetings, so their progress is limited to the amount of time we can dedicate to training on Zoom. I think with a larger group of students, particularly graduate students, this could be flipped towards requiring students to complete the lessons on their own, and using coworking meeting times for covering larger concepts and completing challenge exercises. At minimum, writing challenge exercises into the earlier training modules and providing additional biological learning opportunities could provide additional coding and troubleshooting time for the students outside of our Zoom availability.
Make time for additional biological training and discussion: It would be helpful to spend additional time on biological understanding along the way. We in the dib-lab are more used to training scientists that already have specific biological questions that need the coding skills to analyse their own data. It is worth realizing that our undergraduates need both, though I’m not entirely how to balance the training time required to build computational skills. Perhaps a 1-hour biology-focused discussion each week would do the trick.
Set output goals at the beginning: Now that we know what can be done in this time, providing additional structure for their lab reports and individual challenges would be good. We might consider introducing git and github earlier to encourage organized project progress. If we had more time, building an Rmarkdown report (instead of manually integrating graphics into a Hackmd) could be a useful addition.
What did we get out of it?
Hannah and I both devoted significant time to this project, sometimes at the expense of moving forward our own research. However, we both found it a very valuable experience, and are proud and excited with the progress our students made in such a short time.
Hannah’s thoughts:
I wanted to sign up as a mentor for the NSURP program because it was an opportunity to provide the internship experience that I wanted in undergrad but couldn’t find. Even though I was really curious about bioinformatics for years, I didn’t take the plunge into computational work until starting graduate school last fall. Since then I have been on the receiving end of limitless, invaluable one-on-one teaching for months, so when there was an opportunity to provide some level of service to others it sounded really refreshing. Also, I was excited to have more face to face meetings about science since COVID had me working alone a lot of the time and I missed talking to people.
I reached out to Tessa about working as co-mentors because I didnt feel like I had a solid enough foundation to take on a student alone, but I did have a good amount of time to dedicate to working with them. I’m so glad she agreed because I way underestimated the amount of material we could get through in a few weeks. She wrote up detailed tutorials that I was able to walk through with our two students, and those documents anchored each of our meetings so that when issues popped up we could work through them and get back on track. The also organized the project as a whole, making it clear how each lesson was a part of the larger project.
I was really happy with how zoom worked out for this project. I think in part because classes moved online for spring quarter, both students were more comfortable (even if by force) with learning in an online format. However, I think 3 students would have been the limit for the synchronous format to work smoothly. With two students, we had them screen share to work out errors, and this was very feasible with 2, but would have been lengthy with 4 or more, which would probably require split break out rooms instead of one group.
Before this internship started, I had just finished Taylor’s sourmash rotation project. Walking through the NSURP tutorials with students was useful for me because it deepened my understanding of a lot of the tools used in our lab.
** The program wanted undergrads to “do research” and this was a good balance of getting them up to speed on bioinformatics, and also letting them do some research-y thinking. We trained more than we did biology.
If we were to do this again next summer I would add some more biology background in addition to training. Probably in some kind of curated reading/video list, with secondary sources like slide sets or recorded lectures that they could work though on their own, since theoretically NSURPers have more biology background.
Tessa’s thoughts:
I was really excited to offer a research experience to NSURP students, but a bit worried about the time required to do it properly. Luckily, Hannah was interested too, and we could combine our efforts. Good news, I think it went really well!
My main goal was to develop a project that would work to train undergraduate biology students (rising 3rd and 4th years) in bioinformatics techniques and research. Along the way, I also wanted to guide Hannah (still a relatively new dib-labber) towards a deeper understanding of our analysis techniques and help her gain experience teaching ANGUS and Carpentries-style lessons (including building a friendly and accessible environment for learning).
I drew heavily on the lab’s rotation and workshop lessons (many of which I worked on and with in prior years) to build the NSURP lesson modules, and attempted to slowly add information and technical complexity as the project progressed. Some of this is reflected directly in the lesson modules, but a large part was added on Zoom, during lulls created during code runtime/ other downtime, or directly from student questions.
Unfortunately, because we were spending time building and modifying training modules as we went, and because I’m still pretty new to microbiology, we didn’t really connect with the larger NSURP community much (over slack). I think this was a bit of a missed opportunity, but hope to engage more if the project continues next year.
Some thoughts on the bigger picture: coding and bioinformatics are still heavily white and male. Friendly, accessible training can help lower the barrier to entry, particularly for underrepresented groups, and is highly feasible over Zoom. I see this as one way to contribute to building the culture and communities I want to see in academia. But research training and exposure is a small piece of the puzzle and must be combined with systematic work combating white supremacy culture in academia.
Final Conclusions
All together, I think this was a highly successful undergraduate research training experience. At the end of the project, the students had a good understanding of the importance of a computational lab notebook, how to execute bioinformatic analyses, how to troubleshoot, and produced a set of preliminary results for an interesting biological question. One of our students even received an Honorable Mention for her final report! But most importantly, I feel confident that these students could start bioinformatics research on an ongoing project, keep a record of their work, troubleshoot some of their own errors, and ask for help as needed. So while our project was more focused on “training” than “research” outcomes, I think it provided a strong introduction to bioinformatic research.
Our training materials are available as a website (https://dib-lab.github.io/2020-NSURP; doi: 10.5281/zenodo.4041204). Let us know if you find them useful. Also check out all the incredible NSURP presentations on the NSURP website here!
Tessa (and Hannah!)