Rebooting AI for Māori Language Learning
Angela Glover Blackwell in conversation with Keoni Mahelona
Discover the powerful role of artificial intelligence in preserving indigenous languages in our latest episode, where Angela Glover Blackwell engages in a compelling discussion with Keoni Mahelona, the talented CTO of Te Hiku Media. They dive into the rich history of the Māori people, uncovering the injustices they've faced and the ways AI is paving the way toward the revitalization of their language.
This episode also explores the pivotal role of children in this linguistic renaissance, highlighting the emergence of Māori language immersion schools, and the contentious issue of data sovereignty in the world of AI. Can we truly decolonize AI to prevent the exploitation and colonization of indigenous data? The essence of this episode lies in the exploration of AI's significant role in fostering equity, justice, and the preservation of indigenous languages. This isn’t just a conversation about technology; it’s about history, culture, equity, and most importantly, the future.
Keoni Mahelona (kanaka Maoli) is the driving force behind the development of digital technologies that aim to protect and promote indigenous languages and knowledge. He makes decisions every day to protect the sovereignty of data, from the digital tools deployed for advanced projects to the storage and sharing of data in appropriate and secure ways.
Angela Glover Blackwell: 0:05
Welcome to the Radical Imagination Podcast, where we dive into the stories and solutions that are fueling change. I'm your host, Angela Glover Blackwell. Artificial intelligence, or AI, is impacting our lives in ways that we are still discovering.
Keoni Mahelona: 0:22
You're in the situation where people could end up being wrongfully convicted for crimes that they didn't commit because some algorithm or some software is being used outside of the original scope for which it was planned.
Angela Glover Blackwell: 0:35
In season three of Radical Imagination, we looked into the ethical issues around AI and how the technology can exacerbate inequality and injustice, but AI can be harnessed to actually advance equity. One organization in New Zealand is working to do just that. Te Hiku Media, a non-profit radio station, is using AI technology to revive and preserve the language of the Māori. Māori are the indigenous people of New Zealand, or Aotearoa the Māori language name for New Zealand. For more on this, we're joined by Keoni Mahelona, the Chief Technology officer at Te Hiku Media. Keoni, welcome to Radical Imagination.
Keoni Mahelona: 1:17
Aloha, it's lovely to be here.
Angela Glover Blackwell: 1:19
Give us some context of who Māori are and what are the factors that led to them losing their language over time.
Keoni Mahelona: 1:26
I guess I should start out in saying that Keoni Kau'i Noa no Hava'i'o. My name is Keoni, I'm actually from Hawaii and I'm a native Hawaiian and so I need to preface what I say with that, because I am sort of speaking about Māori and really Māori should be the one speaking about sort of the historical injustices. But I am mindful that I'm here representing Te Hiku Media Maori organization and that the board of our organization, who are representative of the five tribes of the far north of Aotearoa, have also supported me in being here and speaking on behalf of Te Hiku Media. So Maori are the indigenous people of Aotearoa, like there were many Native Americans who are indigenous to all of the Americas. actually, pre-colonization, The Pacific was sort of, I guess, the last place that the Western powers reached in terms of colonizing, and Aotearoa was one of them. In 1840, a number of tribes of Aotearoa, or i as they call themselves, signed the Treaty of Waitangi British Crown at the time to sort of, I guess, create a situation whereby British subjects could live and operate in Aotearoa alongside Maori. But the Maori version of the Treaty of Waitangi, which is the actual version upheld in the courts today, was sort of loosely translated in English in favor of what the Crown wanted. Obviously they broke a lot of the agreements in the treaty, as the United States have done with treaties with the indigenous people of America, as part of colonization and racial injustice. There were numerous attempts by the British Crown, by the colonial government – this is 19th century and 20th century – to make the Maori language extinct, to make the Maori culture extinct and also to make Maori extinct. So the Maori language was in sharp decline through the, I think, earlier 20th century, through various acts, like acts by the government, to outlaw Te Reo Maori. And then, as we do, we fight and we continue to fight and we still are fighting to this day to preserve our languages and our culture. It wasn't until, I think, 1984, when the Te Reo Māori Language Act passed in government, which made Te Reo Māori an official language of Aotearoa and that sort of set the scene, at least in a, I guess, a political sense, to the revitalization that would sort of be supported through past grievances committed by the Crown. Since Te Reo Māori has become an official language of Aotearoa, there have been other legislations that have enabled the language to continue to thrive and to support those who are actively fighting to revitalize the language. So we have the Ewe Radio Act or a piece of legislation that gave Te Reo Māori the rights to be broadcasted on radio frequencies AM and FM frequencies and that's where we sort of come along. In 1990, te reo Iri Rangyo Tikuotika started a radio station broadcasting in Te Reo Māori, the Indigenous language, out of a portaloo that was converted into a little transmission site with an antenna on the top of the hill and that was sort of the genesis of the work that we do at our organization to revitalize Te Reo and promote Te Reo Māori.
Angela Glover Blackwell: 5:11
AI is rapidly evolving and it's impacting all facets of our lives. Sometimes it can feel overwhelming. Much of what we hear about AI is in the realm of ethics biases in machine learning, harms from facial recognition software. But in the right hands and with the right goals, it can also be an incredible tool for social good. Talk about how this work to preserve Māori language has come about and how are you accomplishing this.
Keoni Mahelona: 5:44
In 2015-2016, we started on a project to make some of our native speakers' stories, that we recorded on video and on cassette tapes, make them available to fluent speakers of Te Reo Māori who wanted to actually improve Te Reo, to adopt more of the native sound, as we say, or language that we had sort of pre-colonization and that is nearly lost to this day in terms of there's only a handful of native speakers who have sort of grown up in that time. In doing that, we had to transcribe hours and hours of native speakers and there's really only a handful of people who can understand the language that they use because of the nuances. So there are dialects, idiomatic expressions, words that might be used that aren't used elsewhere in the country. So you need someone who actually really comes from that tribe or that language group to be able to transcribe and know what's there. So in doing that project, it took a lot of time to actually transcribe some of the audio and we were just like, 'oh, why don't we just like get a computer to do it for us?' But I mean, the technology existed in 2016 when we thought let's teach machines to understand Te Reo Māori, to speak Te Reo Māori, and we knew that the hard problem would actually be around curating the data required to train a machine model or quote unquote "AI to be able to understand Te Reo Māori and to understand it well, Understand it in a way that ensures the integrity of the languages maintained and that can also understand native speakers, in addition to your sort of contemporary speakers, who actually sound a bit different. It was like, 'ah, the technology is there, we can do this, why not?' It took a few years and by 2018, we had launched a campaign to build the corpus that we knew we needed. We collected over 300 hours in 10 days by mobilizing the community, getting their support, and also the trust that we have in the community. They knew that by helping us, they trusted us to do the right thing with this data that we were collecting and with that data, it only took a couple of months, pull some packages from the web, train some models and there we go. We had, Te a, Māori speech recognition working at, say, around a 15% word error rate, just three months after we collected the corpus we needed.
Angela Glover Blackwell: 8:18
As in so much of invention, necessity actually pushed in this direction. You needed to do the transcription, but what's happened as a result of now having this capacity? Is it helping language to be taught in schools or other things happening in the broader community?
Keoni Mahelona: 8:33
One of the side effects of our corpus campaign is that we learned that a lot of people really liked the ability to just read phrases on their own time in Te Reo Māori and get feedback on their reading of those phrases, on their pronunciation. So with our speech recognition model we sort of took that, hacked it, and trained a pronunciation model. So now we have an app that you can download that helps you to improve your Te Reo Māori pronunciation. It can identify incorrect vowel and consonant sounds that you make and sort of guide you on repeating those words or phrases to improve your pronunciation. And we've measured and actually seen that people are improving their pronunciation by using that tool. We also have a transcription tool which is used by universities and other people in Aotearoa to help them in transcribing audio for either for their research or for their businesses, and we also have an API that allows Māori developers to access some of the language tools that we've developed so they can invent the games or apps that they see their communities need to help in language revitalization.
Angela Glover Blackwell: 9:39
I'm really interested in anything that gets the children, because that's how things really get back into the culture. So where are the children coming into this? Is it schools or elsewhere? And I'm also curious about whether young people are seeing this as kind of a rebellious act or expanding their sense of identity. Is any of that going on?
Keoni Mahelona: 10:00
That's a really, really good question and you know, it's that intergenerational transmission of language and knowledge that is so important. There are Kūtakaopapa Māori schools, so those are Māori language immersion schools. Quite a number of them who do amazing jobs, and not only raising children to speak toay ut teaching some really amazing students. So one area where we work with schools is in live streaming Māori speech competitions and Māori Kapa Haka performances. We live video stream these so people can watch these performances online. They can listen to them on the radio and then they can watch them on demand later. In the speech comms. you get a real good taste of where sort of high schoolers are in terms of AI. I think they're on both sides of the fence in terms of 'yep, let's use these AI tools' or 'no, we don't necessarily need them.'.''.''' In terms of the tools that we've developed specifically, we've often focused on what we need as an organization In terms of transcription tools, synthesis, and we've sort of just gotten into the area of thinking about language learning apps with our pronunciation app or how we might support others. There is a Māori developer who actually builds Māori language games for young kids and he uses our API to help measure students' pronunciation and give feedback to them.
Angela Glover Blackwell: 11:18
And while it's called artificial intelligence, it's really only human. It's still human made. That's right. It's people who are building this software. I'd love to talk about how your team is being conscious of centering people through this work. Your work focuses on protecting indigenous data sovereignty. How has that come up as you were building these tools to parse data about your communities?
Keoni Mahelona: 11:42
Data is the most important thing for us. I mean, like our team, have an intimate knowledge of the data that we're collecting. We haven't just scraped massive amounts of data from the web, to the point where we can't actually understand what biases we've picked up, what data we have, and therefore the results of the models we train . Because we put so much, I guess, effort in the curation of our data sets and the respect that we give it. When we see errors or bugs show up in the models we train, we can often pinpoint the source of the data that might have been the cause or at least have a pretty good hypothesis as to why a model might fail a certain way. Because we have that intimate connection with the data. When we collect data, we apply a Kaitiakitanga license, so we never claim ownership of anything really. I mean, it's sure, in the Western sense and in the court of law we'll have to claim copyright ownership, but in terms of how our organization operates in Te Reo Māori and the Māori sort of worldview, we don't own any of the data. We're just the guardians at this point in time, to protect that data and ensure that it's looked after with cultural protocols, and ensure that it's only used for the betterment of the indigenous people of Aotearoa. The organization already had the protocols. We just took them and applied them to the digital data. So those protocols, for example, are the same protocols used around managing land, and indigenous people, i respect o land, don't necessarily have ownership of land. We actually have a responsibility, or a Kuleana, to protect that land and if we look after the land, the land looks after us. That's sort of a common phrase I think you hear from indigenous p. eople. And so we just took that and applied it to this digital data that we're collecting. If we protect and respect this data, this data will look after us.
Angela Glover Blackwell: 13:43
When you're thinking about what you're doing with the language. do you see this as an innovation within AI?
Keoni Mahelona: 13:52
For us? No, but we knew it was going to be the most important thing that we do, Because we don't have the billions of dollars with all the brightest engineers to make really cool foundational models and train really large models. But we knew we had something special around data and we knew that not only was this the hardest thing we had to do, but this might be the most impactful thing that we can do. As a small nonprofit organization in the far north rural region of Aotearoa, we knew that the work we did around data around respecting data could have global implications for others in terms of how we look after data, and I think we're seeing that talking about the Kaitiaki anga license. Our work has been covered in some mainstream media places, which is really good because it means that those people working at Big Tech are also hearing our story. And I mean, I am not a fan of Big Tech, but I know there are some good people in those organizations and it's good that the right people hear the message and empower communities with the tools for them to support themselves rather than sort of coming in and saving us.
Angela Glover Blackwell: 14:58
There's a lot that's been written about movements to decolonize AI, especially across the global South. Data can be used to exploit or to create something powerful. Talk just a little bit about these issues of decolonizing AI and making sure that it helps us to be able to build a more equitable society.
Keoni Mahelona: 15:18
The first principle's answer is they've taken all our land and now they're after our data. ight, And I say data is the final frontier of colonization. Right, they're after our language and our knowledge, because they've taken everything else. And I think that's the first thing in terms of what do we mean by decolonizing AI? So the first thing is first, we need to stop the big data grabs, or the big land grabs. Right, and that's exactly what's happening. We see Big Tech hiring third parties soliciting data from indigenous communities. They're doing it, but they don't want people to know that they are doing it. That's why they hire third parties to do their dirty work, right? So that's the first one is stop the big data grabs, stop the theft of data. So that's one thing that we sort of advocate for. We encourage people not to put their things on free platforms online, because nothing is free in a capitalist society. Then you have training models to understand minority languages, and why might that be important? Well, that might be important if you want to surveil a particular ethnic group or a minority group. That's the other thing that we're combating in terms of decolonizing AI is preventing governments and corporations from doing harm with these tools. In our Kaitiakit Kaitiaki anga license. we explicitly say that you are not allowed to use the tools that we've developed for surveillance or persecution or to further marginalize indigenous communities in Māori.
Angela Glover Blackwell: 16:47
I wonder whether you can tell me any stories of people who you've worked with where being able to access the language, seeing it respected and lifting in this way has meant something to them personally– opened up relationships, ideas, possibilities that people can say didn't exist before.
Keoni Mahelona: 17:07
In 2019, we gave a presentation at a conference in Hawaii and there are a few other indigenous people in that watching our presentation, talking about this work that we did in terms of building Māori speech recognition, or automatic speech recognition, ASR. That inspired them. They saw that, 'well, gee, if the Māori can do it, so can we. All we had to do was just tell a story about us saying we have to do this, we need to do it for the language and let's do it. That was enough to inspire other indigenous people who have, since then, gone on this journey as well, in terms of using AI tools to help their communities, be it with language revitalization or teaching their kids how to code or how to train models those sorts of things. That's the story that I love. That we inspire one another as minorities, as people struggling to fight the colonizer, That we can come together and not be each other's savior, because we don't want that to play out that way. We want to get over the white savior complex. We want to empower communities to lead the change that they need, because they know what they need best. Our vision of AI is a tool to empower communities to lead the change that they need.
Angela Glover Blackwell: 18:22
That is so inspiring as a goal. I wonder how you feel about ho pe. W Whether doing the work that you're doing, seeing how a powerful tool like AI can be used to advance relationship building, cultural restoration, people realizing their power. What do you think about hope?
Keoni Mahelona: 18:44
I'm hopeful. I mean when I see what other indigenous communities are doing and how they're using AI. There's definitely some hope there. There's universities who are in possession of these just treasure troves of indigenous knowledge that is inaccessible because we don't have the resources to make it accessible, meaning there's no money to digitize it and then to translate it. There's not enough people to do that work. We actually see AI or machine learning computers as a tool to help us speed up that process of making our cultural knowledge more accessible to our people. That is a key component of our cultural revitalization in terms of language revitalization, in terms of our cultural knowledge that might have been lost. I'm just hopeful that these tools will empower us to bring back that knowledge that the colonizer tried to erase.
Angela Glover Blackwell: 19:49
Keoni, thank you so much for talking with us.
Keoni Mahelona: 19:52
Ke Mahalo j ia oe. Thank you to you as well. It's been a lovely time having a recorded with you.
Angela Glover Blackwell: 20:03
Keon i. Mahelona M n n l l is the Chief Technology Officer at Te Hiku Media. There are legitimate reasons to be concerned about AI and machine learning. Every ill prejudice and instinct to do wrong can be heightened by these technologies. At the same time, we have legitimate reasons for hope. Listen to Keoni suggest that the best of humankind can yank these technologies back from the possibilities of continued wrongs. This technology can be captured by our best instincts. In the right hands, with the right, frameworks undefined can be used to create broad social benefits, correcting historical wrongs and preparing young people to carry their culture into the future. Radical Imagination is a Policy Link podcast produced by Futuro Media. The Futuro Media team includes Marlon Bishop, Andreas Caballero, Nour Saudi, Stephanie Lebow, Julia Caruso and Andy Bosnick, with help from Roxanna Agiri, Fernanda Santos, Juan Diego Ramirez and Roxanne Scott. The Policy Link team includes Glenda Johnson, Loren Madden, Ferchil Ramos, Vanice Dunn, Perfecta Oxholm, Eugene Chan and Fran Smith. Our theme music was composed by Taka Yusuzawa and Alex Sugira. I'm your host, Angela Glover Blackwell. Join us again next time and in the meantime, you can find us online at radicalimagination. us. s. Remember to subscribe ! and share. Next time on. Radical Imagination, Rediscovering the Outdoors. In a not so distant past, Black people were the owners and operators of lodges and resorts during the darkest times of American life for Black people. See you next time on Radical Imagination.