1 00:00:06,560 --> 00:00:08,640 My research focuses on the relationship 2 00:00:08,640 --> 00:00:10,520 between lighting and active travel. 3 00:00:10,520 --> 00:00:13,960 We look particularly at certain aspects of lighting such as illuminance 4 00:00:13,960 --> 00:00:18,560 or the distribution of light, how spatially uniform it is, 5 00:00:18,560 --> 00:00:21,960 and how that affects things like the numbers of people cycling 6 00:00:21,960 --> 00:00:25,200 or how safe people feel when they're cycling at night. 7 00:00:25,200 --> 00:00:27,600 In the research, we took a number of different sets of data, 8 00:00:27,600 --> 00:00:32,400 so we had data about the number of people cycling in Birmingham, in the UK, 9 00:00:32,400 --> 00:00:35,000 and we compared that with information 10 00:00:35,000 --> 00:00:38,760 about the locations of street lighting on different streets 11 00:00:38,760 --> 00:00:43,000 and also information about the illuminance on those streets. 12 00:00:43,000 --> 00:00:46,280 So we were able to get night time aerial photography 13 00:00:46,280 --> 00:00:51,040 from an open source website and use those night time 14 00:00:51,040 --> 00:00:54,880 aerial images to estimate how bright streets were around 15 00:00:54,880 --> 00:00:58,400 these counters that were counting the number of people cycling 16 00:00:58,400 --> 00:01:00,760 and then sort of compare the relationship there between 17 00:01:00,760 --> 00:01:02,040 the brightness of the streets 18 00:01:02,040 --> 00:01:05,400 and the number of people who were actually cycling on that street. 19 00:01:05,400 --> 00:01:10,080 So we wanted to make sure we published our data openly, 20 00:01:10,080 --> 00:01:14,520 so we made sure we published our results in an open access journal, PLOS ONE, 21 00:01:14,520 --> 00:01:17,280 and also PLOS ONE has an open data policy 22 00:01:17,280 --> 00:01:19,760 where it really encourages you to publish your data openly. 23 00:01:19,760 --> 00:01:24,280 Because a lot of our data sources were openly available anyway and we processed them, 24 00:01:24,280 --> 00:01:27,440 we were able to share links to those open access databases 25 00:01:27,440 --> 00:01:31,720 but also include our own data as CSV files, 26 00:01:31,720 --> 00:01:34,440 which are quite interoperable 27 00:01:34,440 --> 00:01:39,720 and could be used in any kind of analytical software. 28 00:01:39,720 --> 00:01:42,840 So our own data files were published in CSV. 29 00:01:42,840 --> 00:01:48,280 We also used GIS (geographical information system) 30 00:01:48,280 --> 00:01:52,520 as part of our analysis process for the data 31 00:01:52,520 --> 00:01:56,080 and we used something called QGIS, which is an open access, 32 00:01:56,080 --> 00:01:59,120 open source, freely available piece of software. 33 00:01:59,120 --> 00:02:02,960 So, all the data files that were related to the GIS analysis 34 00:02:02,960 --> 00:02:07,200 were also interoperable and available 35 00:02:07,200 --> 00:02:11,960 to use through this kind of open source GIS software as well. 36 00:02:11,960 --> 00:02:15,160 We also wanted to really make it clear 37 00:02:15,160 --> 00:02:19,440 that the analysis that we did was very transparent as well. 38 00:02:19,440 --> 00:02:23,120 So, we used software called R to analyse our data. 39 00:02:23,120 --> 00:02:27,120 And so, we were able to publish the code that we'd written to analyse our data, 40 00:02:27,120 --> 00:02:29,640 openly as well, on the PLOS ONE website. 41 00:02:29,640 --> 00:02:31,920 So, anyone could see how we'd analysed our data. 42 00:02:31,920 --> 00:02:36,600 But also, so that data could, if somebody wanted to, they could use it themselves as well. 43 00:02:36,600 --> 00:02:40,160 I think institutional support is really, really helpful, really valuable 44 00:02:40,160 --> 00:02:44,200 because I for one, and certainly I'm sure many of my colleagues 45 00:02:44,200 --> 00:02:46,560 probably aren't all that familiar with the requirements 46 00:02:46,560 --> 00:02:49,680 for actually sharing your data in a FAIR way. 47 00:02:49,680 --> 00:02:55,680 Quite often, we might think that just putting a raw CSV file out there is enough. 48 00:02:55,680 --> 00:02:59,720 But, there are lots of guiding principles about how to annotate that dataset: 49 00:02:59,720 --> 00:03:02,000 the metadata that would need to go with it. 50 00:03:02,000 --> 00:03:04,160 There's lots of best practice related to that. 51 00:03:04,160 --> 00:03:07,680 I think that's where the institution and the library can support us with that. 52 00:03:07,680 --> 00:03:10,800 One of the challenges was knowing the best 53 00:03:10,800 --> 00:03:14,120 format to use for sharing the data. 54 00:03:14,120 --> 00:03:15,920 And I think that's certainly something that I would 55 00:03:15,920 --> 00:03:17,760 in the future would do better. 56 00:03:17,760 --> 00:03:19,840 I think people don't realise the time it takes 57 00:03:19,840 --> 00:03:23,640 to get your data into that format where it can be shared openly 58 00:03:23,640 --> 00:03:26,160 because quite often, when you're immersed in a project, 59 00:03:26,160 --> 00:03:28,840 you know what your data means, but you give that to anyone else 60 00:03:28,840 --> 00:03:30,200 and they'll have no understanding. 61 00:03:30,200 --> 00:03:31,360 So it takes a lot of time to get it 62 00:03:31,360 --> 00:03:34,800 into a format that could be understood by anyone. 63 00:03:34,800 --> 00:03:39,480 Commitment to that time is an important aspect of how we can do it successfully. 64 00:03:39,480 --> 00:03:41,880 In retrospect and certainly for future projects, 65 00:03:41,880 --> 00:03:47,040 it would be great to have used something like R markdown to create the paper. 66 00:03:47,040 --> 00:03:49,920 R Markdown is a tool or a bit of software. 67 00:03:49,920 --> 00:03:53,400 It helps you provide a direct link between your data, 68 00:03:53,400 --> 00:03:55,880 your analysis script as well 69 00:03:55,880 --> 00:03:58,800 and the actual text within the paper. 70 00:03:58,800 --> 00:04:02,680 So you can really see the dynamic link between 71 00:04:02,680 --> 00:04:04,160 all three of those aspects. 72 00:04:04,160 --> 00:04:06,920 In terms of positive benefits from the project, 73 00:04:06,920 --> 00:04:09,760 it forces me to learn it and that, in turn, 74 00:04:09,760 --> 00:04:13,040 helps me pass it on to students that I'm supervising. 75 00:04:13,040 --> 00:04:16,720 So, hopefully they will, as they develop their research career, 76 00:04:16,720 --> 00:04:19,240 it will become the norm for them 77 00:04:19,240 --> 00:04:23,200 whereas at the moment, it's perhaps not quite the norm just yet. 78 00:04:23,200 --> 00:04:28,360 In terms of advice to a researcher who is thinking of publishing their data openly, 79 00:04:28,360 --> 00:04:29,680 or are unsure where to start, 80 00:04:29,680 --> 00:04:32,960 first of all, I certainly recommend speaking to the guys in the library 81 00:04:32,960 --> 00:04:36,280 who really know best practice on how to do this and could give 82 00:04:36,280 --> 00:04:39,320 some practical examples of how and where to do this. 83 00:04:39,320 --> 00:04:41,840 Ultimately, I think it’s just taking that first step. 84 00:04:41,840 --> 00:04:44,840 Nobody's going to do it perfectly the first time. 85 00:04:44,840 --> 00:04:46,160 But I think just making that effort 86 00:04:46,160 --> 00:04:50,640 and showing the willingness to share your data openly is the best 87 00:04:50,640 --> 00:04:52,160 first step you can make.