AI is not going to take over just yet

goose-geese-history

I wish I had blogged about this then, but I didn’t. Here’s an article from the Institute of Electrical and Electronics Engineers (IEEE) about how 2017 is going to be the year of the self-driving car. I would bet a dollar than no one reading this in 2023 has a fully self-driving car. I would even go so far to say that most people have at most a car which might do some warning about leaving the lane or something like that. I was pretty certain that the transition to self-driving cars would take at least 10 years, and today I will say that again. I think we are still at least 10 years from out having a substantial portion (say greater than 10%) of cars on the road be self-driving. I don’t think all taxi drivers (or Uber or Lyft etc.) are going to be out of work in 10 years. There are a number of reasons for it. Some are technical, but many are also legal. When a fully self-driving car causes an accident, who is at fault? The driver? The car manufacturer? Some combination? As of right now, I think that the laws in most countries still hold the driver liable. So there is not that much point in having a self-driving car if you still have to pay attention to the road to not get into an accident.

I will make the same prediction for AI. Everyone is talking about ChatGPT, and with good reason. It is a huge leap forward. I am also very impressed by it. It generates very grammatical and sensible text, and is often correct. However, I am not yet worried about it taking over my job. I am not the first person to note it, but ChatGPT seems really smart if you ask it a question in an area where you are a novice, but when you try to something in which you are an expert, it is quite trivial to find mistakes. Here is one I tried today. I asked ChatGPT to “Explain to me why the plural of goose is geese”. Here is the response

The plural of “goose” is “geese” because it follows the historical development of English language.

English, like many languages, has irregular plurals for certain words. The word “goose” is one of those words, and its plural form “geese” has its origins in the Old English language.

In Old English, the plural of “goose” was “gēs”, which evolved into the Middle English word “gees” and then finally into the modern English word “geese”. This is why “geese” is used as the plural form of “goose” today.

It’s worth noting that not all animal names with a similar structure as “goose” follow this irregular pluralization. For example, the plural of “moose” is “moose”, not “meese”. In fact, English has many irregular plurals, which can sometimes make the language tricky to learn and use correctly.

This explanation is mostly correct, but I happen to know more about this subject. In particular, I happen to know that the process called “umlaut” (sound change), in this case, a vowel harmony principle, plays an important role in the development. I decided to ask ChatGPT for more clarification, asking “what role did umlaut play in this process?” It replied:

Umlaut did play a role in the development of the plural form “geese” from the Old English word “gōs”. Umlaut is a linguistic phenomenon that occurs in some languages, including Old English, where the vowel sound of a word changes due to the influence of a following vowel or consonant.

In Old English, the plural of “gōs” was “gēs”, where the plural marker “-es” was added to the singular form. However, due to the influence of the high front vowel “i” in the plural marker, the vowel sound in the root word “gōs” was altered through umlaut, resulting in the front vowel “ē” in the plural form “gēs”.

Over time, the plural form “gēs” evolved into “gees” in Middle English and finally into “geese” in Modern English, with the umlauted vowel sound being retained in the modern plural form.

So, in summary, umlaut played a role in the historical development of the plural form of “goose”, but it was just one factor among many that contributed to the evolution of the language over time.

Okay, so now as it gets more specific, it gets even more wrong. It mentions that the plural marker was “-es”, and the influence of the high front vowel in the plural marker resulted in umlaut. So what is the correct plural marker: “-es” or “i”? I happen to know that the correct answer is “i”, but a novice would not know this. Then I decided to ask about the other factors, asking “what other sound changes played a role?”. ChatGPT replied:

There were several sound changes that played a role in the development of the plural form “geese” from the Old English word “gōs”. Here are some of the main sound changes that took place:

  1. The Old English vowel “o” in “gōs” changed to the Middle English vowel “oo” (as in “goose”), due to a process called vowel breaking. This change occurred in the singular form of the word.
  2. The plural marker “-es” was added to the singular form “gōs” in Old English to form the plural “gēs”.
  3. The high front vowel “i” in the plural marker “-es” influenced the vowel in the root word “gōs” through a process called umlaut, resulting in the front vowel “ē” in the plural form “gēs”.
  4. In Middle English, the front vowel “ē” in “gees” changed to a back vowel “oo” (as in “goose”), due to another process of vowel breaking.
  5. Finally, the Middle English plural “gees” underwent a final sound change in Modern English, where the “s” at the end of the word became “se” and eventually merged with the vowel sound, resulting in the modern plural form “geese”.

So, in summary, a combination of vowel breaking, umlaut, and vowel shifting were the main sound changes that led to the development of the plural form “geese” from the Old English word “gōs”.

Here, it goes really wrong. I am fairly certain that vowel breaking has nothing to do with this. Vowel breaking is a linguistic phenomenon, which usually changes a monophthong into a diphthong (see the vowel breaking wikipedia entry). I think that ChatGPT might think that the “oo” in goose is a diphthong because of the double “oo” spelling, but that is just a spelling convention. It has nothing to do with the actual pronunciation. Then it says that “”ē” in “gees” changed to a back vowel “oo”” which absolutely did not happen. Step Number 5 is also completely wrong.

I actually gave a 4 minute talk on this subject several years ago at a work event. Here is a brief summary slide from that talk: (note that I made a mistake during that presentation and said that loss of schwa was vowel epenthesis, which is incorrect. It is vowel deletion. Epenthesis is the process of inserting something, not deleting it. I corrected it with a little image editing)

goose-geese-history
goose-geese-history

So, back to ChatGPT – if I were a fifth grader asked to research the history of why the plural of goose is geese and gave the first response that ChatGPT gave, I think I would probably receive a good mark. However, if I tried the same as Ph.D. student, I would get a poor grade.

Bing just recently announced that they are piloting a similar type of system, and the results there are not just a little bit inaccurate, but downright bizarre. I would highly recommend this article by Simon Willison. I laughed out loud. But at the end, it did get kind of creepy. People are already figuring out how to trick these systems too, and they learn from user input, so if people start giving them junk input, they will turn into junk, just like the last time Microsoft released a chat bot in 2016, where it only took a day or so for users to turn it into a racist troll.

Unfortunately, the internet is full of massive amounts of comments by racist trolls, which means that any deep learning model trained on data from the internet is going to have to deal with that too. This article from time magazine details how OpenAI outsourced manual labeling of such content to hundreds of workers in Africa.

What about SEO?

Search-engine optimization is voodoo science in which people try to get their websites to be ranked higher in search engine results (especially Google) by using a number of different tricks, like applying proper markup around tags and such. The biggest lever for SEO though is the content itself. If you simply write a lot of content on your blog around topics that you want to be known for, you will get more traffic. So let’s say that you create a widget and want to sell it. You might publish a weekly blog post all about this widget, to try to attract attention. Let’s say that you only have this one widget for sale. It seems sort of unreasonable that you would be able to give great new information about this one widget on a weekly basis. But you need that SEO juice, so you write articles every week with catchy, click-baity titles like “The one widget you will ever need”, containing fairly generic content, not very different from last week’s article. 20 years ago, the internet was not full of this sort of stuff, but now it is. Guess who (or what) else is really good at writing this sort of content? Yep, you guessed it, ChatGPT. I have heard quite a few people talking about having ChatGPT generate this sort of content for them, and I am pretty certain that some of it already is generated that way. We already established that the intended reader for this sort of drivel is not humans, but rather machines (the search engines). So now we start having machines write the content, which is to be consumed by machines. This realization has really brought me into an existential crisis. What is the point of it all? If this is indeed the route that we end up taking, finding truly relevant and expert information on the internet is only going to become more difficult than it already is.

Obligatory ChatGPT screenshots

Join 165 other subscribers

archives

  • 2024 (10)
  • 2023 (8)
  • 2022 (15)
  • 2021 (19)
  • 2020 (1)
  • 2019 (1)
  • 2018 (2)
  • 2017 (1)
  • 2016 (2)
  • 2015 (5)
  • 2014 (5)
  • 2013 (2)
  • 2011 (7)
  • 2010 (10)
  • 2009 (50)
  • 2008 (28)
  • 2007 (31)
  • 2006 (8)

Category