Open Source Provenance Survey (OSPREY)
Project Title: Open Source Provenance Survey (OSPREY)
Primary Supervisor: Dr Rob Baxter
Additional Supervisor(s): Mr Donald Scobbie
The OSPREY project will establish and maintain a data set of ISO image, binary package and file checksum, quality and identity metrics to track and verify the provenance of Open Source Software (OSS).
The objective of the data set is to provide transparent, reverse engineered software supply chain management for OSS projects. A primary goal is to offer verification services based on consensus metrics so that trust in OSS can be established through open and continuous audit processes. A public blockchain may be an appropriate vehicle for such services.
The project will ensure continuous surveillance and analysis of Open Source project package and source code using a combination of software repository mining, data science and machine learning technologies.
The core value of the project to EPCC flows from the unique security assessment capability it creates in establishing trust through composition analysis, supply chain scrutiny and origin identification in virtual machine and container images prior to their use in secure research settings.
Within the DDI/City Region Deal context it establishes a locally curated Internet-‐ scale data set and data research programme that contrasts with the smaller Scottish and UK national data sets. Additional benefits expected from the project include: a genuine global scale data set for generic data science research; leadership and innovation in the practice of operating highly secure and trusted research environments; engagement opportunities in Open Source Software supply chain management, security and related research at a global level.
A UK 2:1 honours degree, or its international equivalent, in a relevant subject such as computer science and informatics, physics, mathematics, engineering, biology, chemistry and geosciences.
You must be a competent programmer in at least one of C, C++, Python, Fortran, or Java and should be familiar with mathematical concepts such as algebra, linear algebra and probability and statistics.
Student Recommended/Desirable Skills and Experience
- Open Source Software ecosystem knowledge.
- Linux systems administration and distribution experience and knowledge.
- Software security: static code analysis and ML approaches.
- Supply chain management.
- Trust and equilibrium in distributed consensus architectures.
(Any useful places to do some further reading around the project: e.g. blog posts about something related, articles, code repos, etc.)
Lenarduzzi, V., Tosi, D., Lavazza, L., & Morasca, S. (2019, May). Why Do Developers Adopt Open Source Software? Past, Present and Future. In IFIP International Conference on Open Source Systems (pp. 104-‐115). Springer, Cham.  Silic, M., & Back, A. (2016). The influence of risk factors in decision-‐making process for open source software adoption. International Journal of Information Technology & Decision Making, 15(01), 151-‐185.
Del Bianco, V., Lavazza, L., Morasca, S., & Taibi, D. (2011). A survey on open source software trustworthiness. IEEE software, 28(5), 67-‐75.
Germonprez, M., Link, G. J., Lumbard, K., & Goggins, S. (2018). Eight observations and 24 research questions about open source projects: illuminating new realities. Proceedings of the ACM on Human-‐ Computer Interaction, 2(CSCW), 1-‐22.
The Linux Foundation OSS Supply Chain Security: https://www.linuxfoundation.org/wp-‐ content/uploads/2020/02/oss_supply_chain_security.pdf