Gareth O’Neill on the Open Science Cloud, Metadata, and Impact
- Podcast
- August 2, 2022
Will research publications exist in the future? Nikesh Gosalia and Gareth O’Neill continue their discussion, talking about the evolution of research articles into nanopublications and the role of commercial publishers in open science. Gareth shares his philosophy on science as a way to get to the truth and provides an overview of the open science efforts in different countries to illustrate the global nature of the open science cloud. He talks about making research accessible to everyone, using metadata to break down language barriers.
Nikesh and Gareth also tackle the topic of impact in research, and how it is changing from simply the journal’s Impact Factor to societal impact outside academia. Gareth emphasizes the need for a multilayered impact ranking system as well as the importance of supporting research on both theoretical and real-world problems. Ending on a light note, they chat about Gareth’s passion for sailing.
Gareth O’Neill is the Principal Consultant on Open Science at the Technopolis Group and a doctoral candidate for theoretical linguistics at Leiden University. As the Former President of the European Council of Doctoral Candidates and Junior Researchers, Gareth is a renowned expert on open science for the Dutch Government and the European Commission. Reach him on Twitter.
Missed Parts 1 and 2 of the conversation? Listen to them here.
Nikesh Gosalia
I am very tempted to ask Gareth, do you, say, in I think 20 to 25 years, or even sooner maybe, do you see that there is future for a research article, or will that not exist? I mean I know we’ve spoken a lot about the data and making that more open and that being more important. Will there be a research article?
Gareth O’Neill
Yes. I think a research article will always be needed. And what I mean by a research article really is a summary of my findings on a dataset. I’ll keep it to that definition. We could argue about like, maybe not need to be always focused on data, especially if you are looking forward and proposing something. But just for now, an article summarizing my results of a dataset, right. We could argue about the level or how many pages that should be, if it’s even pages in the future, and not symbols. But anyway. But what I would like to bring forward is the concept of what’s called the ‘nanopublication’ which is also linked a lot to FAIR data I think. So, a nanopublication is in linguistic terms, I guess, a key assertion, a fact, a statement of facts such as malaria is transmitted by mosquitoes, right. That’s a nanopublication. What that means is I’ve made a statement. I can describe it with metadata so that a machine can find it and refer to it and I can link it, just like a publication.
But what’s crucial here is it’s also linked to a dataset, right, or an article, a bigger article, where I can go check is it true. Now, that assertion should be peer reviewed, the article should be peer reviewed, and the assertion should be peer reviewed. So, the people reviewing that article saying this is a good article, this is truthful, compared to the dataset, agree that the assertion ‘malaria is transmitted by mosquitoes’ is true for that article and that research data, right. So, take that and push that on the side for a second. Now take that assertion and link it to your dataset, in metadata. It’s what’s called a ‘digital object,’ a fair digital object. It can be referred to. It can be linked. Ad a machine can search for it. A machine can immediately collect all the assertions ‘malaria is transmitted by mosquitoes’ across every dataset if it’s linked there, right.
So, for me, the future of publications is probably they get smaller. They are immediately linked to data. I do think in the future we’ll be able to interact with data on a completely different level in different ways. But I think the article always stands as that summary of what the hell is in this dataset. You could argue that we could get rid of the publication and have live tools that can essentially generate an article from a dataset that’s highly likely in the future. But again, somebody has to peer review it. That will be a human. And somebody has to agree on the assertions that that article is saying. Is it truthful? Now you read an article, right, and you have conclusions, which are usually not bullet point assertions. I am actually talking about bullet point systematically linguistic assertions that a machine understands. This is the outcome of my article. It is transmitted by malaria, is transmitted in this temperature range, in this humidity range, by this species of malaria, mosquito, these specific assertions.
They are types of articles in themselves. Hop to the dataset, you immediately can see what this dataset has done. The role of publishers in that future, that I don’t know. I guess in your context you are talking about commercial publishers. Obviously, the publisher is the researcher, they publish their research. There is a body that helps them do that, maybe curates it and manages it and preserves it. And that could be a commercial publisher, but not necessarily. And I think the commercial publishers need to start thinking about the future, and I do think they are thinking about the future. You know, we have five mega publishers that control to a large extent the academic publishing ecosystem. We have Elsevier, Springer Nature, Taylor & Francis, Wiley, and I think it’s SAGE and they are commercial in most cases. But you also have nonprofit learned societies acting as publishers, and some of them are enormous like the American Chemical Society.
And their practices have not always been very good, if I can put it that way. They have in many cases worked against researchers. They have in many cases worked against open access, open science. So, I think they need to evaluate their mission in the academic publishing ecosystem and I think they need to evaluate their business models for the future and get in line with what at least Europe is doing in terms of opening up and start thinking ahead of the future in terms of the future of publications and the future of connected linked accessible machine actionable data.
And for the public good, I’ll put that in like high levels of highlighting ‘for the public good’ not strictly for commercial profit, and certainly not for dual use technology although of course that will be there at a governmental level, but certainly for the public good. This should be changing lives, making our lives better, making our lives easier, and facilitating our quality of life and our perspective towards the future.
Nikesh Gosalia
And while you mentioned Europe, I know for a fact that in terms of thinking about the policies, there is quite a bit of advancement. But as we can see general trends, I think a lot of researchers are also from non-native countries like we spoke, Japan, Korea, let’s even talk about China and just the sheer output is growing. What is your take on that Gareth in the sense that how can we make some of these things more global in nature, how do we kind of tag along as partners driving this together. There are a lot of political winds.
Gareth O’Neill
So, I am an Irishman living in Amsterdam, working in Brussels, right, just to put this into perspective. So, my point is science – the world has become global. And science is without exception and should be public, global enterprise. Research transcends boundaries and always should. Even during times of war and crisis, we see scientists are trying to work together, maybe not all but certainly scientific community trying to work together. And that’s the whole point of science in a way. It steps beyond the here and the now and the cultural differences and the political issues, to get to the truth. And what I mean by the truth is, if it’s even there, is objective truth of our existence and objective truth in the world, right. Whether there is a real world or not is another question, which we could keep for another podcast. But in any case, the closest we can get in our physical interaction with this universe to specific preceptory system or body, and the closest we can get to the truth. And that’s science. It’s the way to get to the truth. It’s the methodology, and the best way we have to get to the truth. And that’s working on the work of those that went before us and working with those that are around us now, right.
So, science is global, it’s not European. My focus is on Europe because I am in Europe. I know Europe best. And it’s very hard to be an expert in everything, and certainly on a global level, right. So, my focus is Europe. Now, that said, I do think that we need to be working together now in terms of open science on a global level. And we are. There’s a lot happening in different regions. If I take, for instance, the European Open Science Cloud Initiative, that’s a huge driver for open science in Europe, right. It’s connecting digital objects, data, but it’s pulling in everything else linked to open science because that’s how train works. As you move forward, building a huge web of data like this, you are pulling in all the different aspects of science. But there is also cloud. For instance, in Brazil, there’s an open cloud in development. There is an open cloud in China. The United States is working on its own cloud infrastructure. So, the idea really here is that we first of all communicate so that the standards and protocols that we use for the data transfer and the data and metadata descriptions are systematic and communal, right. There’s no point in us having one, United States having another, and then we can’t do anything. So, the idea is that we essentially hook up, ultimately leading towards, if you will, a global open science cloud.
And maybe just to be clear here, and I am going to quote Karel Luyben, a good colleague, and current president of the EOSC Association, who said that EOSC, European Open Science Cloud is a terrible word. Because it’s not European, it’s starting in Europe but ultimately its goal is to link to the rest of the world, agree on standards, and create a web of data, which is non-boundary specific. It’s not open. It’s about FAIR data. It’s about machine actionability of data which could be open but does not have to be open. It’s not about science. We are starting with research. We are starting with research data, first of all, in the public sphere and academia, but it will extend to research data in the private sector, and it will extend to all data. So not just research data but, for instance, public data. You could think of transport data. You could think of medical data, and ultimately any data that’s out there as long as they comply.
Just like the internet. There is no one specific type of webpage or content. Everything is available. It’s a living organism, right. And it’s not a cloud. At its heart, it’s a set of protocols on how to share the data, on how to make the data findable, accessible, interoperable, reusable, and on how to gain access to that data. The data itself is stored at local data repositories, probably universities or companies, and they connect. And so, if you will, this open science cloud is kind of, how will I say, emergent. It emerges from the connection of the others, right. The parts extend. The whole is beyond the parts. So again, like a human body, right. You see this one entity called Gareth talking to you with an apparent personality, and an accent. But I mean at the core I am a collection of organs, a collection of cells, which somehow make this whole. And that’s what the future of this cloud is.
So, to get there, we are working on this particular EOSC in Europe, but there are other regions working on it. And the idea is ultimately to connect to datasets from outside of Europe, and to connect the clouds or whatever they call their initiatives outside of Europe, and globally, communally work on this together so that this data is out there for all researchers. I should be able to find data from India. They should be able to get access to my data from Europe, and a machine should be able to look at both and do some work to help us along that path.
Nikesh Gosalia
I must acknowledge the way you are able to explain some of these things, Gareth, in my personal experience I’ve found some of the terms, some of the policies to be extremely technical, but this is just so fascinating. I am able to connect the dots so much better than ever before. And I must acknowledge, and I must say, thank you so much.
Gareth O’Neill
Thousands, if not tens of thousands of people are doing this. Ultimately, the researchers, there are 2 million researchers alone in Europe, right. So, they are doing all this really detailed technical work, and the technicians of course are building all of this. So, there’s a huge amount of work going into this. And again, not just in Europe, colleagues from all across the world. And that’s the idea. And I get your point on different countries, different languages. We are conversing in English. And of course, a lot of the main research publications or journals are in English because it’s the lingua franca that we have in our current time. But there are many countries publishing in their own language, in their own language journals, which are overlooked because we can’t read it.
But this is where digital technology comes into play again. In the future, what’s core is that that article has metadata. And I don’t care what language it is, it could even be in a new conceptual language we create, but your articles – what’s your native language?
Nikesh Gosalia
Hindi.
Gareth O’Neill
Hindi. So, let’s say you publish an article in Hindi. I publish one in English or in Gaelic. We go back to the assertions I am talking about, and I am thinking ahead now, I don’t need to read your article in Hindi. I just need to know what your key assertions were to know if it’s relevant or not. If it’s relevant, I take the next step, and I get a future translation mechanism to translate that article for me, right. And either again it translates direct from Hindi to English, or it translates into a meta language that can then be disbursed immediately across the web into all languages. This is something Google, I am 100% sure, is working on of course. But in any case, language should not be a barrier in the future. Even now it shouldn’t be a barrier. I should be able to access the key assertions, conclusions from your article. Data, the metadata should also be similarly easily machine translatable. And then the data itself should speak for itself, unless it’s in the language in which case we deploy translation tools of the future, right.
So, I don’t see boundaries on languages. We don’t all need to speak English and we shouldn’t. But we should have an agreement on the metadata, on our digital research, digital objects, and on how to access that and translate it perhaps. And certainly, agreements on, how can I get your key assertions from your article to know if it’s relevant for me or not? If your article says malaria is transmitted by rabid dogs, and that’s a low number, it might still be interesting. I want to know why. Is there something here I’ve missed? And I don’t need to read your article. And I don’t need to even look at your article. A machine can just list all of the groupings of assertions, and then network analyze it to help me along with what’s relevant here.
Nikesh Gosalia
I agree with you Gareth. One final area that I want to briefly touch upon and one specific question that I had Gareth is I think there’s been a lot of conversation in the last few years around impact. And there are probably different definitions around it. I think in the UK we have the Research Excellence Framework. And so, what is your take on what does impact mean to you. How are we progressing on that side? How do we kind of take more and more research out to the public so that they can understand it in maybe in simpler terms? Overall thoughts of that.
Gareth O’Neill
So, this is another podcast in itself. It’s one of those topics. Impact is huge, right. And ultimately comes down to what do you mean by impact. Let’s take the word that’s used in academia at a straight level. So, there’s what’s called the Journal Impact Factor, right, a JIF. And this is essentially how we measure the impact of journals. Now, the history of the JIF I think is simple. Research libraries, universities have strict fixed budgets for buying subscriptions to journals. And so, to help them pick what subscriptions should I buy because I can’t buy them all, and they’re not all relevant for my researchers, what are the most relevant journals from my group, right, in terms of the discipline and in terms of how relevant they are. I used the word relevant, I didn’t use the word good. So, relevant.
And so, the Impact Factor came along, looking at how they cite each other, communities cite each other, and how important those journals are to researchers, right. This is where the rise of the branded journal came. Suddenly, it became clear some were more equal than others, and suddenly were more important. And suddenly, the Impact Factor shifted from a tool to help libraries to buy subscriptions, to a recognition of excellence or importance. And thus, by extension, the peer review committees of these journals, and the journals themselves, became the gatekeepers to academic excellence. So, if you publish in Nature, you must be a genius. And you are the one that I need to work at my institute, right. So, impact is a very restricted sense there for how we work currently in academia. And that’s essentially impact in your research community. In other words, how important or how – and I use these words cynically, but how important you were in your community, right. So, that’s that impact. And what we know is that that’s rubbish. Impact, that was not what its original intention was. We know, for instance, in Nature, that not all articles are as important as the rest just because they are in Nature. You have the few that get mega-cited, because they probably are groundbreaking, and then the rest get pulled up because of the mega-citations when you average it out, right.
So, I think impact is now becoming to mean impact on society, right. So, whereas impact typically was measured within the research community, the current shift we see certainly in Europe and I think also America and beyond is impact outside of academia. How are you spending the public money and what are you doing, right. And this is where the Sustainable Development goals come into play and get pulled to research, right. So, you see a lot of countries now saying yes, you can work on whatever topic you want as long as it fits one of these goals or objectives of these goals.
Now, that’s twofold. First of all, fundamental research doesn’t fit in there. And it shouldn’t. There should always be money for fundamental research and taking a chance on research that may go nowhere. That’s the point. We don’t know where the next breakthrough is, right. The second point, however, is we should be funding research that actually changes people’s lives and prioritizing that research which addresses immediate societal crises or challenges or needs, right. How do we eradicate poverty? How do we eradicate climate change? How do we eradicate infectious diseases? How do we make people’s lives better? How do we reduce suffering? How do we work together better? But how do we reduce wars? These should have priority in our times. It shouldn’t be the complete picture. There should be the option for or the support for fundamental research, and the options to work on stuff that’s probably completely irrelevant, without picking a field there.
I think this is how we should be moving forward. I think it needs to be societally relevant. And in that context, impact in that sense, is multifold. Impact on your discipline, right, what is your impact as a linguist in the linguistic field, impact in the bigger picture of science. What relevance does the work I have in linguistics for the drive towards the future of science, possibly for other disciplines, for new breakthroughs, whether that’s in information or in terms of products or services for the market? And then what is the relevance for society? Are we changing people’s lives? Are we making people’s lives better? And how are we doing that? Because these people are paying us. There should be a benefit for them too. So, this impact, it needs to be defined better. I know, for instance, in Europe the European Commission is working on pathways, impact pathways for open science. So, starting to define how we do open science, all the different practices, and how we can measure real impact of doing open science.
But like I said, I take a bigger approach there on impact in terms of within a discipline, within science, and then within society. And the correlations are linkages between those. And of course, it is impact for the person, right. What does it have for me as a researcher, what do I get from it? Because why else am I doing it? Yes, I get the amazing experience of a breakthrough, and I feel like I’ve contributed. But also, in my career, what do I get? Because that’s what every person asks, right. How do I achieve what I want to achieve in my lifetime or in my career?
So, impact is on many, many levels, and many, many sectors, right. It’s a matrix. And some should be prioritized and maybe some lesser prioritized, or there should be some form of ranking, I don’t know. And it depends on who is looking. That ranking will depend on who is looking. A citizen will look and say, what have you done for me? A researcher will say, what do I get? What does science say? Science says we go forward.
Nikesh Gosalia
Absolutely, yeah. This is brilliant. And like you rightly said, Gareth, I think we need a lot more time to discuss some of these things. Because this has opened up so many questions, so many thoughts in my mind. But just to kind of end our show and let’s end on something light. I don’t know if this still holds true, but I thought I’ll check with you Gareth. I mean, we read in your CV that you were a pretty avid seafarer.
Gareth O’Neill
I was out on a boat yesterday. I got burned actually.
Nikesh Gosalia
Oh really, okay. Why do you like sailing? Is there a story behind it? Do you kind of pursue that even now?
Gareth O’Neill
Well, I was born on an island, and I need to be surrounded by water whether around me physically or literally out on the water. I mean, I live in Amsterdam because there’s over 175 canals. I am surrounded by water here. So, it’s a good balance for me that I can leave Ireland and not be completely surrounded by water and move here and still be surrounded by water in the canals and there’s a big lake here and so forth.
Sailing, yeah, it’s more of I guess, the word is a passion maybe. I just like being on the water. There is a history to it. In Ireland we don’t have that many boats actually, but we have some traditional boats. I’ve actually built a very archaic rowing boat in Ireland made from hazel rods and animal skins. Although we didn’t use animal skins, we actually used like a type of tarpaulin to avoid that issue. But a colleague of mine, a friend of mine, he’s a boat builder. He spent his entire life on boats in the Netherlands, and he has built what we call an Irish Hooker. A Hooker is a special type of traditional Irish boat. There are different sizes, but the big one is about 35 feet long. It’s got usually Red Sails. And when it sails it gets very sharp to the water. And these boats were used for transporting goods on the West Coast of Ireland, typically bringing turf or wood even over to the islands which have no wood, no trees, so needed these materials, these goods to survive. It was originally a workhorse, now it’s a pleasure boat. There’s not many actually in the world. But there is one two here actually in Amsterdam built by my good friend, and he goes out on his boat every now and again. So, we used to go out sailing on the Ijsselmeer. This is the lake in the Netherlands which used to be the former South Sea, and then they just built 30 Kilometer dyke across the ocean and turned it from a saltwater lake into a freshwater – sorry, a saltwater sea into a freshwater lake.
If you come to Amsterdam, you may sometimes catch me out there on the Ijsselmeer on the boat in good weather.
Nikesh Gosalia
Really nice. Thank you for sharing that Gareth. One final question. Because you are very knowledgeable, you are very passionate about science, open science, just for the benefit of listeners, where can we get to find more about you or everything that you are working on or anything that you have worth sharing if you don’t mind sharing.
Gareth O’Neill
Well, I guess, this podcast is a good way. I mean, I do a lot of different things in terms of work that outside of work linked to research. I don’t hold a blog. I don’t have time for it at the moment. But what I would say is, I mean, I do have an ORCID account. I am sure people can find me on ORCID if they want to see some of the stuff I’ve done. I do have a Twitter, but I’ve been very lazy the last year or so with this. But I think otherwise, just type my name into Google every now and again and see if something useful or interesting is there for people.
I do end up giving a lot of talks, especially for researchers, to explain open science to them and why we do this and how we should do this. So, I mean, I am out there on the net, people can find me, yeah. Put my name in, you probably get this. I know there is Gareth O’Neill in Ireland who plays Gaelic football, you might get him from Crossmaglen. So, you’ll just have to be a researcher and like delimit all the finds and I am sure you’ll find me out there somewhere.
Nikesh Gosalia
Gareth, thank you so much for joining us, and I am sure the listeners will find our conversation together extremely helpful. Thank you once again.
Thank you everyone for joining us. You can subscribe to this podcast on all major podcast platforms. Stay tuned for our next episode.
All Things SciComm is a weekly podcast brought to you by ScienceTalks. You can subscribe on Apple Podcasts, Spotify, and Google Podcasts.