WEBVTT 00:00:00.000 --> 00:00:08.640 align:middle line:90% 00:00:08.640 --> 00:00:11.520 align:middle line:84% Hi Matthew, so we're here at a library, 00:00:11.520 --> 00:00:13.020 align:middle line:84% and we're here to talk to Matthew 00:00:13.020 --> 00:00:15.630 align:middle line:84% who is a librarian here about data management. 00:00:15.630 --> 00:00:19.380 align:middle line:84% So first off, how would you describe data management? 00:00:19.380 --> 00:00:22.920 align:middle line:84% Basically it's the process of reviewing everything 00:00:22.920 --> 00:00:24.855 align:middle line:84% you do around organising, documenting, 00:00:24.855 --> 00:00:25.980 align:middle line:90% and working with your data. 00:00:25.980 --> 00:00:27.780 align:middle line:84% And the research project, basically. 00:00:27.780 --> 00:00:29.700 align:middle line:84% And when we talk about data, we can see things 00:00:29.700 --> 00:00:33.330 align:middle line:84% like your observational data, you get from instruments. 00:00:33.330 --> 00:00:36.610 align:middle line:84% Or experimental data, computational data, 00:00:36.610 --> 00:00:39.930 align:middle line:84% that comes up from research or-- sorry, computer programmes. 00:00:39.930 --> 00:00:41.370 align:middle line:90% And that sort of thing. 00:00:41.370 --> 00:00:44.010 align:middle line:84% But also records, so it can be the administrative records 00:00:44.010 --> 00:00:45.430 align:middle line:84% you're creating for your project. 00:00:45.430 --> 00:00:48.330 align:middle line:84% Or it can be other records that you are maybe using the data 00:00:48.330 --> 00:00:50.050 align:middle line:90% source, and that sort of thing. 00:00:50.050 --> 00:00:53.220 align:middle line:84% So anything that goes into our movement analysis projects 00:00:53.220 --> 00:00:54.450 align:middle line:90% is classified as data. 00:00:54.450 --> 00:00:55.350 align:middle line:90% Yeah, basically. 00:00:55.350 --> 00:00:56.920 align:middle line:90% OK, that makes sense. 00:00:56.920 --> 00:01:00.930 align:middle line:84% So what components are involved in data management? 00:01:00.930 --> 00:01:04.350 align:middle line:84% Well basically you're going to be including things 00:01:04.350 --> 00:01:06.600 align:middle line:84% about-- how are you going to be organising your files, 00:01:06.600 --> 00:01:07.830 align:middle line:90% organising your data? 00:01:07.830 --> 00:01:09.210 align:middle line:84% How are you going to be collecting your data, 00:01:09.210 --> 00:01:10.835 align:middle line:84% and the different data sets that you're 00:01:10.835 --> 00:01:12.160 align:middle line:90% going to be working with. 00:01:12.160 --> 00:01:14.570 align:middle line:84% It's also going to be looking at where you're 00:01:14.570 --> 00:01:15.820 align:middle line:90% going to be storing your data. 00:01:15.820 --> 00:01:18.320 align:middle line:84% Is it going to be stored in the cloud, is going to be stored 00:01:18.320 --> 00:01:19.990 align:middle line:90% in some other kind of solution? 00:01:19.990 --> 00:01:22.210 align:middle line:84% How you maintain the quality of your data. 00:01:22.210 --> 00:01:23.730 align:middle line:84% So kind of the process as you think around that. 00:01:23.730 --> 00:01:25.560 align:middle line:84% How are you going to comply with legal standards, 00:01:25.560 --> 00:01:26.280 align:middle line:90% and requirements? 00:01:26.280 --> 00:01:28.860 align:middle line:84% If for example, you're working with copyright information, 00:01:28.860 --> 00:01:31.585 align:middle line:84% or if you're working with private information, 00:01:31.585 --> 00:01:33.210 align:middle line:84% or classified information, to make sure 00:01:33.210 --> 00:01:34.570 align:middle line:90% you have a plan for that. 00:01:34.570 --> 00:01:38.730 align:middle line:84% And basically, how are you going to be 00:01:38.730 --> 00:01:41.370 align:middle line:84% kind of-- just working with your data, throughout the process. 00:01:41.370 --> 00:01:41.880 align:middle line:90% OK. 00:01:41.880 --> 00:01:44.940 align:middle line:84% So I'm assuming the process is some sort of a cycle? 00:01:44.940 --> 00:01:46.830 align:middle line:84% I've heard what a data lifecycle is, 00:01:46.830 --> 00:01:48.100 align:middle line:90% could you please explain it? 00:01:48.100 --> 00:01:49.560 align:middle line:90% Sure. 00:01:49.560 --> 00:01:53.653 align:middle line:84% Yeah so basically you're starting with the planning 00:01:53.653 --> 00:01:54.320 align:middle line:90% of your project. 00:01:54.320 --> 00:01:55.410 align:middle line:84% So making sure that you kind of know 00:01:55.410 --> 00:01:56.440 align:middle line:84% what data you're going to be collecting, 00:01:56.440 --> 00:01:58.510 align:middle line:84% and what methods you're going to be using for your project. 00:01:58.510 --> 00:02:00.843 align:middle line:84% And then using that to maybe potentially search for data 00:02:00.843 --> 00:02:03.370 align:middle line:84% sets that might be existing already and available. 00:02:03.370 --> 00:02:05.100 align:middle line:84% And then actually going out and collecting your data. 00:02:05.100 --> 00:02:07.410 align:middle line:84% And what methods you're going to be using for that, and what 00:02:07.410 --> 00:02:08.910 align:middle line:84% equipment maybe you might be needing 00:02:08.910 --> 00:02:10.495 align:middle line:90% for collecting your data. 00:02:10.495 --> 00:02:12.120 align:middle line:84% So in any case, the motion capture lab, 00:02:12.120 --> 00:02:14.745 align:middle line:84% so you're looking at the motion capture cameras, and the suits, 00:02:14.745 --> 00:02:16.360 align:middle line:90% and that sort of thing. 00:02:16.360 --> 00:02:19.020 align:middle line:84% So you can also think about documenting-- 00:02:19.020 --> 00:02:21.040 align:middle line:84% if you're doing, for example EEG type stuff, 00:02:21.040 --> 00:02:22.350 align:middle line:84% making sure that you have the sensors 00:02:22.350 --> 00:02:24.183 align:middle line:84% and understanding what sensors you're using. 00:02:24.183 --> 00:02:26.268 align:middle line:84% What kind of outputs are coming from that. 00:02:26.268 --> 00:02:27.810 align:middle line:84% You could be-- also from eye tracking 00:02:27.810 --> 00:02:29.400 align:middle line:84% with the cameras, what kind of headsets 00:02:29.400 --> 00:02:31.440 align:middle line:84% you're using to track the eyes, or what kind of outputs 00:02:31.440 --> 00:02:32.280 align:middle line:90% come from there. 00:02:32.280 --> 00:02:34.740 align:middle line:84% If it's proprietary formats, if you're going to convert it 00:02:34.740 --> 00:02:36.060 align:middle line:90% to a nonproprietary format. 00:02:36.060 --> 00:02:40.260 align:middle line:84% So that you can then archive that in a better way. 00:02:40.260 --> 00:02:43.920 align:middle line:84% We also like the MRI, that you also have in your labs. 00:02:43.920 --> 00:02:46.680 align:middle line:84% That you're taking of the fMRI images, 00:02:46.680 --> 00:02:49.323 align:middle line:84% and how you're going to be dealing with the files 00:02:49.323 --> 00:02:51.240 align:middle line:84% afterwards, and what the metadata is required, 00:02:51.240 --> 00:02:53.118 align:middle line:90% for example, for that. 00:02:53.118 --> 00:02:55.410 align:middle line:84% After that you're going to be cleaning, and processing, 00:02:55.410 --> 00:02:57.670 align:middle line:90% and documenting that data. 00:02:57.670 --> 00:02:59.380 align:middle line:84% So making sure that the quality's there, 00:02:59.380 --> 00:03:03.670 align:middle line:84% and it's actually going to be suitable to analysis, 00:03:03.670 --> 00:03:04.765 align:middle line:90% which is the next step. 00:03:04.765 --> 00:03:06.640 align:middle line:84% Which also then produces its own set of data, 00:03:06.640 --> 00:03:08.515 align:middle line:84% that you have to manage appropriately, right? 00:03:08.515 --> 00:03:10.620 align:middle line:84% So it kind of goes back to this whole cleaning, 00:03:10.620 --> 00:03:13.140 align:middle line:84% and processing, and documenting process. 00:03:13.140 --> 00:03:15.600 align:middle line:84% So then after analysis, then what you're going to be doing 00:03:15.600 --> 00:03:16.875 align:middle line:90% is sharing that data openly. 00:03:16.875 --> 00:03:18.750 align:middle line:84% Because that's more and more of a requirement 00:03:18.750 --> 00:03:20.970 align:middle line:84% these days, with funders requiring that you 00:03:20.970 --> 00:03:22.272 align:middle line:90% share your data sets openly. 00:03:22.272 --> 00:03:23.730 align:middle line:84% And then archiving it, in somewhere 00:03:23.730 --> 00:03:25.320 align:middle line:90% to make it publicly available. 00:03:25.320 --> 00:03:28.050 align:middle line:84% Well that's-- the parts that can be made publicly available, 00:03:28.050 --> 00:03:30.420 align:middle line:90% made publicly available. 00:03:30.420 --> 00:03:31.230 align:middle line:90% That makes sense. 00:03:31.230 --> 00:03:33.510 align:middle line:84% I assume there's some specific documents? 00:03:33.510 --> 00:03:37.200 align:middle line:84% Now I know that each group may have certain ones, 00:03:37.200 --> 00:03:39.810 align:middle line:84% or each institution might require different ones, 00:03:39.810 --> 00:03:42.930 align:middle line:84% but what are our kind of main types of documents, 00:03:42.930 --> 00:03:44.520 align:middle line:90% that you might have? 00:03:44.520 --> 00:03:46.390 align:middle line:84% As far as the data management plan? 00:03:46.390 --> 00:03:47.520 align:middle line:90% Yeah. 00:03:47.520 --> 00:03:50.775 align:middle line:84% So that's kind of this one document, 00:03:50.775 --> 00:03:52.650 align:middle line:84% that's kind of the most important around data 00:03:52.650 --> 00:03:53.150 align:middle line:90% management. 00:03:53.150 --> 00:03:55.140 align:middle line:84% As everybody calls it a data management plan. 00:03:55.140 --> 00:03:57.497 align:middle line:84% And it kind of captures all these different components 00:03:57.497 --> 00:03:59.830 align:middle line:84% of the data management process, that you're looking for. 00:03:59.830 --> 00:04:03.460 align:middle line:84% So if you're looking at describing in there, 00:04:03.460 --> 00:04:06.078 align:middle line:84% specifically your data sets, and then using it talking about 00:04:06.078 --> 00:04:07.620 align:middle line:84% organisation, and that sort of thing. 00:04:07.620 --> 00:04:08.880 align:middle line:84% And it should be a live document, something 00:04:08.880 --> 00:04:10.110 align:middle line:90% that's updated over time. 00:04:10.110 --> 00:04:12.680 align:middle line:84% You may not know at the very beginning of your research 00:04:12.680 --> 00:04:15.180 align:middle line:84% what data sets or what equipment you're going to be needing. 00:04:15.180 --> 00:04:19.110 align:middle line:84% But it should be a process, and should be a planning process, 00:04:19.110 --> 00:04:21.000 align:middle line:84% that then leads to creating this plan that 00:04:21.000 --> 00:04:22.570 align:middle line:90% gets updated over time. 00:04:22.570 --> 00:04:26.190 align:middle line:84% OK so if say, I win the lottery halfway through my project, 00:04:26.190 --> 00:04:27.480 align:middle line:90% and-- 00:04:27.480 --> 00:04:28.140 align:middle line:90% You quit. 00:04:28.140 --> 00:04:28.830 align:middle line:90% --just quit. 00:04:28.830 --> 00:04:31.920 align:middle line:84% Somebody else can come update the data management plan, 00:04:31.920 --> 00:04:33.558 align:middle line:84% and keep continuing on my research. 00:04:33.558 --> 00:04:35.100 align:middle line:84% They can basically take your project, 00:04:35.100 --> 00:04:37.440 align:middle line:84% and continue with it, because everything will be-- 00:04:37.440 --> 00:04:39.383 align:middle line:84% well, should be documented, hopefully. 00:04:39.383 --> 00:04:41.550 align:middle line:84% And that they can understand all the different files 00:04:41.550 --> 00:04:44.830 align:middle line:84% and analysis you conducted on your research up to that point. 00:04:44.830 --> 00:04:46.980 align:middle line:84% So they can just pick it up, and go well, maybe 00:04:46.980 --> 00:04:50.660 align:middle line:84% after a month or two of reviewing everything, but yeah. 00:04:50.660 --> 00:04:52.480 align:middle line:84% Yeah, they should be able to do that. 00:04:52.480 --> 00:04:55.787 align:middle line:84% OK, what's the best way to get started with data management? 00:04:55.787 --> 00:04:57.120 align:middle line:90% Contacting your resource person. 00:04:57.120 --> 00:04:59.940 align:middle line:84% So asking your colleagues, who do should I talk to? 00:04:59.940 --> 00:05:02.730 align:middle line:84% Often the university library will be a very good place 00:05:02.730 --> 00:05:06.600 align:middle line:84% for that, because they'll have some aspect of training 00:05:06.600 --> 00:05:07.670 align:middle line:90% in data management. 00:05:07.670 --> 00:05:09.847 align:middle line:84% So they're usually a good place to start. 00:05:09.847 --> 00:05:11.680 align:middle line:84% If not, there should be a local data manager 00:05:11.680 --> 00:05:12.700 align:middle line:90% that you can usually contact. 00:05:12.700 --> 00:05:14.590 align:middle line:84% And also help you with some of the more-- 00:05:14.590 --> 00:05:16.313 align:middle line:84% or a lab manager like yourself, which 00:05:16.313 --> 00:05:18.230 align:middle line:84% can help you with some of the process as well. 00:05:18.230 --> 00:05:21.467 align:middle line:84% So if not, there usually is some web pages available, 00:05:21.467 --> 00:05:23.050 align:middle line:84% and there's also a bunch of resources, 00:05:23.050 --> 00:05:25.300 align:middle line:84% I'm sure, at the bottom of this video where you can-- 00:05:25.300 --> 00:05:26.920 align:middle line:84% Yeah, yeah so we'll put resources 00:05:26.920 --> 00:05:29.770 align:middle line:84% in the links of our articles, as well. 00:05:29.770 --> 00:05:31.420 align:middle line:84% All right great, is there anything else 00:05:31.420 --> 00:05:34.050 align:middle line:84% that people need to know to get started with data management? 00:05:34.050 --> 00:05:35.900 align:middle line:84% Make sure you get that plan in place, 00:05:35.900 --> 00:05:38.025 align:middle line:84% and make sure you talk to your colleagues about it. 00:05:38.025 --> 00:05:39.690 align:middle line:84% OK, all right, thank you for your time. 00:05:39.690 --> 00:05:41.400 align:middle line:90% Thank you. 00:05:41.400 --> 00:05:48.288 align:middle line:90%