1 00:00:07,200 --> 00:00:13,480 My research focuses on learning from multiple sources of data for better prediction. 2 00:00:13,480 --> 00:00:21,680 We develop translational AI technologies for better analysing multi-modal data in health care, 3 00:00:21,680 --> 00:00:24,680 including medical images, genomics data, 4 00:00:24,720 --> 00:00:28,400 chemical data and electronic health record data. 5 00:00:28,400 --> 00:00:31,320 From this data, we learn useful features 6 00:00:31,320 --> 00:00:34,720 and also analyse their complex relationships. 7 00:00:34,720 --> 00:00:38,160 This PyKale project aims to develop 8 00:00:38,160 --> 00:00:41,520 a software package for machine learning 9 00:00:41,520 --> 00:00:46,480 from multiple sources of data for interdisciplinary research. 10 00:00:46,480 --> 00:00:50,120 We formulated green machine learning principles 11 00:00:50,120 --> 00:00:55,200 to reduce repetitions, reuse existing resources 12 00:00:55,200 --> 00:00:59,760 and recycle machine learning models across disciplines. 13 00:00:59,760 --> 00:01:04,520 We designed a pipeline based API to standardise 14 00:01:04,520 --> 00:01:10,000 all machine learning workflows into the same six steps. 15 00:01:10,000 --> 00:01:15,680 We take the FAIR approach because we rely on open source software 16 00:01:15,680 --> 00:01:20,200 in both our teaching and research activities. 17 00:01:20,200 --> 00:01:23,960 Therefore, we want to become part of it. 18 00:01:23,960 --> 00:01:29,680 In September 2021, our PyKale software package 19 00:01:29,680 --> 00:01:32,280 was officially approved 20 00:01:32,280 --> 00:01:34,800 to become an official member 21 00:01:34,800 --> 00:01:38,400 of the world-leading PyTorch ecosystem. 22 00:01:38,400 --> 00:01:41,720 There are many barriers and challenges. 23 00:01:41,720 --> 00:01:45,360 The most challenging one is the time 24 00:01:45,360 --> 00:01:49,120 and efforts needed to make things happen. 25 00:01:49,120 --> 00:01:56,200 To overcome this, we have to align this with our other tasks. 26 00:01:56,200 --> 00:01:59,720 What we did is to blend this 27 00:01:59,720 --> 00:02:04,680 with our other teaching and research activities. 28 00:02:04,680 --> 00:02:09,360 In retrospect, I would like to have started 29 00:02:09,360 --> 00:02:14,880 some of the necessary tasks much earlier, such as testing. 30 00:02:14,880 --> 00:02:20,400 It seems to be quite difficult before we actually start doing it. 31 00:02:20,400 --> 00:02:23,640 But once we have done a few examples, 32 00:02:23,640 --> 00:02:26,840 it appeared to be relatively easy. 33 00:02:26,840 --> 00:02:30,280 We should spend less time worrying 34 00:02:30,280 --> 00:02:32,960 and more time coding. 35 00:02:32,960 --> 00:02:37,520 Having built this open source software package, 36 00:02:37,520 --> 00:02:42,960 it becomes much easier for us to share our research. 37 00:02:42,960 --> 00:02:46,760 For example, anybody with a Google account 38 00:02:46,760 --> 00:02:51,840 will be able to run our real world machine learning example 39 00:02:51,840 --> 00:02:54,640 on the cloud in their browser 40 00:02:54,640 --> 00:02:57,360 without any local installation. 41 00:02:57,360 --> 00:03:01,320 This enables us to bring more students 42 00:03:01,320 --> 00:03:04,080 and researchers on board 43 00:03:04,080 --> 00:03:10,040 to either contribute to its development or make use of it. 44 00:03:10,040 --> 00:03:13,800 In the collaboration with AstraZeneca, 45 00:03:13,800 --> 00:03:19,320 we have agreed from the beginning that all our development 46 00:03:19,320 --> 00:03:23,720 will be made open source under the PyKale framework. 47 00:03:23,720 --> 00:03:29,880 The FAIR principles will continue to play a vital role in my research. 48 00:03:29,880 --> 00:03:34,720 I embed these principles not only to my research, 49 00:03:34,720 --> 00:03:37,800 but also teaching and services. 50 00:03:37,800 --> 00:03:45,280 All my teaching materials are made open source and freely available on GitHub. 51 00:03:45,280 --> 00:03:50,880 In my role as the Turing Network Development Award lead, 52 00:03:50,880 --> 00:03:55,120 We created a Sheffield AI website 53 00:03:55,120 --> 00:03:58,600 as an open source project 54 00:03:58,600 --> 00:04:03,240 where everyone can contribute 55 00:04:03,240 --> 00:04:07,280 and participate in its development. 56 00:04:07,280 --> 00:04:12,840 If you are a researcher wishing to generate real world impact 57 00:04:12,840 --> 00:04:18,720 from your research, you should definitely follow the FAIR principles. 58 00:04:18,720 --> 00:04:22,520 Making your data and software FAIR 59 00:04:22,520 --> 00:04:25,280 is the most effective 60 00:04:25,280 --> 00:04:29,200 and efficient way to generate impacts. 61 00:04:29,200 --> 00:04:33,360 If you are not sure about how to do it, 62 00:04:33,360 --> 00:04:40,520 the best way is to find good examples, sharing similarities 63 00:04:40,520 --> 00:04:43,360 with what you want to do 64 00:04:43,360 --> 00:04:46,640 and to learn from these good examples 65 00:04:46,640 --> 00:04:51,720 to become one of them over time.