1 00:00:00,840 --> 00:00:05,280 We do very much data-driven work and I am always very much of the opinion 2 00:00:05,280 --> 00:00:07,600 that we should gather our data to answer a question. 3 00:00:07,600 --> 00:00:10,320 But that very much doesn't mean that's the only 4 00:00:10,320 --> 00:00:12,480 question that data can answer. 5 00:00:12,480 --> 00:00:15,400 These experiments costs tens, hundreds of thousands, 6 00:00:15,400 --> 00:00:17,280 sometimes even millions of pounds to execute, 7 00:00:17,280 --> 00:00:21,080 so it's very important to follow the FAIR principles 8 00:00:21,080 --> 00:00:24,160 in order to make data as accessible to other people, 9 00:00:24,160 --> 00:00:27,800 to enable other people to be able to take full advantage 10 00:00:27,800 --> 00:00:31,800 of the investment that patients, the public and medical charities have made 11 00:00:31,800 --> 00:00:33,920 in generating that data. 12 00:00:37,480 --> 00:00:41,200 In this project, we were interested in a cancer called multiple myeloma. 13 00:00:41,200 --> 00:00:43,360 It’s a blood cancer 14 00:00:43,360 --> 00:00:47,040 and it is a particularly heterogeneous cancer. 15 00:00:47,040 --> 00:00:50,160 And what that means is that different people have very different cancers. 16 00:00:50,160 --> 00:00:53,720 What we’re interested in is identifying the places 17 00:00:53,720 --> 00:00:58,480 where those pieces of DNA were that were controlling the expressions of the genes. 18 00:00:58,480 --> 00:01:03,240 So, we are involved in generating very large datasets 19 00:01:03,240 --> 00:01:04,440 and then using those datasets 20 00:01:04,440 --> 00:01:07,200 to ask questions about how genes are regulated 21 00:01:07,200 --> 00:01:08,920 and when they're switched on and switched off 22 00:01:08,920 --> 00:01:10,840 and how that goes wrong in disease. 23 00:01:10,840 --> 00:01:14,880 The biggest challenge is almost always 24 00:01:14,880 --> 00:01:17,880 convincing people who aren't used 25 00:01:17,880 --> 00:01:21,400 to this way of working, to work this way. 26 00:01:21,400 --> 00:01:26,680 Sometimes it can be difficult to convince collaborators to release data. 27 00:01:26,680 --> 00:01:30,680 When it came to the end, I had quite a lot of work to do, 28 00:01:30,680 --> 00:01:34,080 tidying up the code, rewriting some of the code 29 00:01:34,080 --> 00:01:38,520 to get it into a state where it could be useful for other people. 30 00:01:38,520 --> 00:01:41,400 I don't think that code has to be perfect when it's released. 31 00:01:41,400 --> 00:01:45,680 I think we should be encouraging people to be releasing non-perfect code 32 00:01:45,680 --> 00:01:48,760 because otherwise lots of people will never release code, 33 00:01:48,760 --> 00:01:52,200 but it does have to be in a state where it is useful for others. 34 00:01:52,200 --> 00:01:53,720 It's actually one of the great things 35 00:01:53,720 --> 00:01:57,880 is that we can just go out there and we can use the data 36 00:01:57,880 --> 00:02:01,440 without having to be in contact with the producers. 37 00:02:01,440 --> 00:02:06,480 FAIR principles are just completely indispensable for the work we do. 38 00:02:06,480 --> 00:02:09,000 The data is so expensive to collect 39 00:02:09,000 --> 00:02:14,320 and there is so much information in the datasets we collect 40 00:02:14,320 --> 00:02:17,600 that are not relevant necessarily to the question we're asking, 41 00:02:17,600 --> 00:02:22,920 that it's just inconceivable that this data shouldn't be out there being reused. 42 00:02:22,920 --> 00:02:25,560 There's nothing more frustrating than reading a study 43 00:02:25,560 --> 00:02:29,360 and finding that you can't make use of the data from that study, 44 00:02:29,360 --> 00:02:34,000 or that you can't interpret it because you can't get the details of the analysis. 45 00:02:34,000 --> 00:02:38,680 So, I'd urge people to very much push for and maybe even help 46 00:02:38,680 --> 00:02:44,440 create discipline-specific repositories that have defined metadatas 47 00:02:44,440 --> 00:02:48,040 and ways of describing metadata and ways of storing the data. 48 00:02:48,040 --> 00:02:50,000 As researchers, 49 00:02:50,000 --> 00:02:54,120 our aim is to bring knowledge into the world, 50 00:02:54,120 --> 00:02:58,520 and FAIR is a way in which we do that. 51 00:02:58,520 --> 00:03:00,760 We allow that information to get out there 52 00:03:00,760 --> 00:03:04,200 and that information to be findable and the work that you've spent. 53 00:03:04,200 --> 00:03:09,000 You’ve invested so much time in that to make the difference that it can make 54 00:03:09,000 --> 00:03:12,080 rather than just sitting on a dusty shelf somewhere. 55 00:03:12,080 --> 00:03:15,840 It helps us to achieve what should be our ultimate goal 56 00:03:15,840 --> 00:03:21,480 as researchers in changing the world and increasing the sum of human knowledge.