GPT-3 is one of the most versatile and transformative components that you can include in your framework, application or service. However, sensational headlines have obscured its wide range of capabilities since its launch. Let’s take a look at the ways that companies and researchers are achieving real-world results with GPT-3, and examine the untapped potential of this 'celebrity AI'.
The interpretive powers of the GPT-3 autoregressive language model stirred up popular tech headlines when it was unveiled by OpenAI in 2020. It could apparently compose feature articles, write poetry, and even talk to the dead.
GPT-3 has perhaps been a victim of its own publicity; it's a vanguard product in the NLP space, but widespread public interest in the possibility of an effective AGI (which GPT-3 is not), combined with tech media's determination to coopt any new AI product into its annual round of febrile headlines, has left GPT-3 rather misunderstood.
Subsequently, its core capabilities have inspired a slew of startups across a range of sectors.
Here we'll take a look at the usable scope of GPT-3 for business purposes, and at some of the companies that have taken up the vanguard in this respect.
First, let's a look at the strengths and weaknesses of the various methods by which GPT-3 answers a prompt.
Models available for transformations in GPT-3 include Davinci, Curie, Babbage and Ada, each of which have different capabilities in terms of speed, quality of output and suitability for specific tasks.
For instance, Davinci is the most sophisticated of the available models, and is most likely to produce usable output that's more complex, and that explores (or at least appears to explore) higher-level domain thinking and analysis (later we'll also take a look at Davinci Instruct, which is capable of following more specific commands regarding the formatting and domain-specificity of its output).
However, some of the leaner and less computationally demanding models are more than adequate for simpler prompts, saving latency and API request costs. For instance, novelist Andrew Mayne has found that much of the most wide-spread knowledge available to GPT-3 is accessible across lower-level models than Davinci:
Treated as a straightforward 'global oracle', GPT-3 is subject to the same inaccuracies and inexactitudes as the publicly available content that it was trained on, and the depth and truth of its responses on any subject is in proportion to the subject's representation (and misrepresentation) on the internet, and in the standard datasets that informs it.
For instance, Davinci knows a little about Charles Dickens' output, but will give up if you second-guess the first response:
Answering the same question 'Who wrote the novel Bleak House?', the other models seemed less well-read, even on multiple attempts:
But within more narrow domains, usually away from subjective arts and culture topics, GPT-3 proves extraordinarily adept.
The GPT-3 model was developed through unsupervised training, wherein it had to discover, rationalize and categorize vast volumes of data without being explicitly told what any of it 'meant'. In the course of training it was necessary for the model to distil labels from the available entities – such as picture captions and descriptions – that accompanied the content, or that had some kind of discoverable relationship with it.
Where a subject has a rigid and agreed taxonomy and hierarchy (such as mathematics), it's much easier to be certain of the relationships between sub-entities in that domain, and to create generative rules that are accurate.
Programming falls into this category, and GPT-3 is very good at it.