DISCLAIMER: DubBot does not endorse using ChatGPT as the only solution for generating regex and XPath expressions. Testing and verification by a qualified human is a must. The DubBot Support Team is here to help you with that.
With ChatGPT featured in just about every news outlet and tech magazine, we wanted to see if this hot, new AI had anything to offer our clients in the way of assisting them in writing regex (regular expressions) used in the Custom Policies checks.
But first, a little background on ChatGPT and regex.
Author: What is ChatGPT?
ChatGPT: ChatGPT is a language model developed by OpenAI. It is trained to generate text based on the input provided to it, and can respond to a wide range of questions and prompts, such as answering questions about general knowledge, providing definitions, or generating creative writing.
According to W3C Schools, "A regular expression (regex) is a pattern of characters. The pattern is used to do pattern-matching "search-and-replace" functions on text."
Wanting to see how ChatGPT would respond, we came up with some regex examples that DubBot clients would benefit from when setting up their Custom Policies.
Example 1
Author: regex to check for multiple, consecutive <br> or <br /> tags in HTML
ChatGPT: In regular expression, you can use the following pattern to match multiple consecutive <br> or <br /> tags in HTML:
/<br\s*\/?>\s*(<br\s*\/?>\s*)+/i
Note: The \s in the pattern matches any white space character, the * means zero or more of the preceding character, the ? means the preceding character is optional, and the i at the end makes the match case-insensitive.
Feedback from our Support Engineer:
What ChatGPT gave us: /<br\s*\/?>\s*(<br\s*\/?>\s*)+/i
This one does the job quite nicely. It’s a little longer than it needs to be and could be presented more straightforwardly, but it does work just fine.
Example 2
Author: regex to check for lowercase of the first letter in each word of "Christopher Newport University"
ChatGPT: Here's a regex pattern to match the proper capitalization of the first letter in each word of the string ‘Christopher Newport University’:
/^Christopher Newport University$/
This pattern will match the exact string Christopher Newport University with no variations. If you want to match the same string with variations in capitalization, you can use the following pattern:
/^[Cc]hristopher [Nn]ewport [Uu]niversity$/
This pattern uses square brackets to match any of the specified characters, and the ‘^’ and ‘$’ characters to anchor the match to the start and end of the string, respectively.
Feedback from our Support Engineer:
What ChatGPT gave us: ^Christopher Newport University$
Or /^[Cc]hristopher [Nn]ewport [Uu]niversity$/
At first glance, ChatGPT does not seem to understand the intent of the query. We want to find any use cases where Christopher Newport University is written and not using the proper casing. The first suggestion will find all instances where Christopher Newport University is using the proper casing. The second instance will find all cases of Christopher Newport University using proper and improper casing.
Our fix would look like this: (christopher [Nn]ewport [Uu]niversity)|([Cc]hristopher newport [Uu]niversity)|([Cc]hristopher [Nn]ewport university)
Using ChatGPT certainly gives you a good starting point if it understands the intent of your question, and there are some advantages to using this language model - increased productivity, eased learning curve, and shortened development time. But keep in mind that ChatGPT "responses" may contain biases that existed in the text used to train the technology. And those biases may be reflected in the generated expressions.
So let’s be clear! The resulting outputs still need to be tested and verified by a real person and, as our examples prove, may need to be tweaked to function correctly and return the desired result. The same is true when using ChatGPT to output XPath expressions.
One last thought that is worth mentioning, DubBot proudly offers users help with setting up any custom Policies that become priorities for flagging content. DubBot users are able to add custom Policies as they need but they also have access to a team of technical developers who can serve as a check that the Policy is using the best technology option (simple text, regex, or XPath) and also talk through the intent to make suggestions on how custom Policies may be improve. With the second example for instance, we may suggest that a simple text check could do the trick versus even having to use a regex.
Please contact DubBot support at help@dubbot.com if you have any questions.