Data and Software Carpentry combo at Edinburgh

Author: Mario Antonioletti
Posted: 29 Aug 2016 | 10:35

Software Carpentry attendees during the shell session. Pic Credit: Martin Callaghan.

With my Software Sustainability Institute hat on, I recently participated in a back-to-back Data Carpentry and Software Carpentry course sponsored by the University's Research Data Service here at the University of Edinburgh. The courses were held in the main University library in a gorgeous room with a glass wall, providing a rather distracting view of the Meadows parkland. 

The lead instructor position was ably taken by Martin Callaghan from Advanced Research Computing at the University of Leeds - being a lead instructor entails extra administrative and coordination tasks so I was quite happy to just be a plain old instructor. The other instructor for the Data Carpentry course was Alexey Tarutin who, unbeknown to me, is doing our HPC with data science MSc at EPCC. Edinburgh University students were given priority and all 24 places on these 2 two-day courses were booked out.

Data Carpentry

The two-day Data Carpentry course consisted of:

  • Using spreadsheets effectively
  • OpenRefine
  • Introduction to R
  • R and visualisation
  • Databases and SQL
  • Using R with SQLite
  • Managing Research & Data Management Plans

All of this training material is available on-line  (under CC-BY licensing) but it helps to have an instructor enrinching the material with their own personal experience, biases and emphasis of what they think is important.

Data Carpentry participants.

I gave the R components of the course. I like R but, as with the old adage for Perl, "There's more than one way to do it". There is possibly too much R to cover in the time available, especially if you make the attendees type the content along with you. I always try to do nearly all of the lesson's content - maybe next time I will be more judicious about what I cover so that less ends up being more. It's also useful to go through these lessons - even as an instructor, you end up reminding yourself of syntax, functionality and learning quite a bit too!

Software Carpentry

After two days of teaching we moved on to Software Carpentry. Again we were booked out for this course, with four of the original Data Carpentry attendees following us on to the Software Carpentry. The course this time covered:

  • Introduction to the Shell
  • Version Control
  • Introduction to Python
  • Using the Shell (scripts)
  • Version Control (with Github)
  • Open Science and Open Research

The lessons for these courses are also available on-line, also under CC-BY. EPCC's Adrian Jackson joined us to teach the Python parts of the course, I gave the shell component and Martin gave the remainder of the material. Martin split the shell and git teaching over two days - in all previous Software Carpentry courses I have participated in this has normally been given as consecutive single blocks on the first day of the course. Staggering the material over two days was useful, especially as it gives participants the ability to assimilate the idea of a revision control system overnight.

This was the first time I have taught the shell material. Its delivery always worries me a little because, with the usual mixed levels of experience, experienced users will find the content too basic and/or the pace too slow while those who have not seen or used the shell before will struggle if you go too fast. Having knowledge of the shell is clearly essential if you want to become a proficient developer but pitching this at the right level is always difficult.

The attendees over the two days appear to have been a mixture of PhD students, postdocs and staff members. Each day is quite intense as there is a lot of material to cover but I hope they enjoyed attending the course as much as I did teaching it.

Data Carpentry teaches researchers basic concepts, skills and tools for working more effectively with data. Software Carpentry teaches researchers basic lab skills for scientific computing.


Mario Antonioletti, EPCC


Blog Archive