Regular Expressions and Grep

4 MIN READ
MIN READ

If you've ever had to search, parse, or edit blocks of text programmatically, chances are you're familiar with regular expressions (also known as regex or regexp). Let's cover exactly what Regular Expressions are, what they're used for, benefits, and some examples.

What are Regular Expressions?

Regular expressions is a special text string/language used for describing search patterns and matching strings in text. Its flexible and powerful syntax lets you create detailed search patterns, from simple words and phrases to complex constructs like e-mail addresses and phone numbers. It's much more powerful than a simple string comparison, and is almost universally supported across programming languages, frameworks, and text editors.

Linux comes with GNU grep command which supports regex. Grep stands for “global regular expression print”. Grep is used to find what you’re looking for, stored anywhere in the file system matching a specified pattern.

Simple grep examples:

grep ‘word’ file1 file2 file3
grep ‘username’ /etc/passwd

You can use regex to specify a string of characters or pattern for grep to match instead of words.

Benefits of Regex

Regexes are much more flexible than traditional text searches. They can detect almost any pattern of letters, numbers, symbols, special characters, and even metacharacters. Where traditional searches look for exact matches, regexes can match patterns of varying length. This makes them useful for finding constructs such as email addresses, IP addresses, URLs, and phone numbers.

Regexes are also concise. A single regex string can contain multiple search terms, perform multiple operations, and return multiple matches. This makes them very easy to implement, reuse, and modify.

Limitations of Regex

Regex has a steep learning curve. Even basic regular expressions are difficult to break down into their base operations. Compared to verbose languages like Python, understanding a regex requires a detailed understanding of the language. This can make expressions difficult to troubleshoot, especially for beginners. This is best expressed in the famous quote by Jamie Zawinski:

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

Heavy text processing can also be slow, depending on the complexity of the expression and the amount of text to search. There are ways to improve performance, such as using compiled expressions (the default in languages like Python), but it ultimately comes down to the efficiency of the expression.

How is RegEx Used? Use Cases and Examples:

Regex has a number of use cases, including:

Searching

Regex is designed for searching. Traditional search methods might only let you search for a specific string, but regex offers much more flexibility and control over how searches are performed.

Example:

Imagine you have a text document (such as a log file) and you want to find all instances of an email address appearing within the document. How would you go about this? You could start by searching for the "@" character, or for ".com", but what if the document also includes Twitter handles or website URLs? What about email addresses that end in ".edu", or ".net"? You would likely need to run multiple searches at a time and use complex string manipulation rules to extract out each potential match.

Alternatively, you could create a single regex expression that searches specifically for email addresses. One method is to use the following expression:

[a-zA-Z0-9-.!#$%&'*+\/=?^_`{|}~]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9]{2,3}

Let's break down this expression:

[a-zA-Z0-9-.!#$%&'*+\/=?^_`{|}~]+ Match any number of letters, numbers, or special characters listed.

@ Match the "@" symbol.

[a-zA-Z0-9-]+Match any number of letters, numbers, or hyphen characters.

\. Match a period.

[a-zA-Z0-9]{2,3} Match any two or three letter word containing letters or numbers.

With this expression, we can return all instances of "user@example.com", "user.name@123company.co", or even "super_user+$10k@dash-co.net", but not "@example" or "http://example.com".

User Input Validation

Regex is often used as an input validation tool. Imagine you have a website where users can sign up by providing their email address. Before the registration can be completed, the user's email address must be verified. With regex, we can perform a simple validation test that checks the formatting of the user's address before we allow them to register. We can even use JavaScript to perform this test and notify the user in real time, while using the same expression used in the previous example.

String Replacement and Masking

We discussed how regex can be used to find patterns of text within larger documents. But what if you wanted to replace, mask, or delete certain text?

Example:

Consider a payment processing service that occasionally logs sensitive data such as credit card numbers and bank account details. To protect their users' privacy, the service should automatically scrub this data before sending its logs to a centralization service. But how do we detect and erase this data after the log has already been written?

With regex, we can create expressions to detect numbers matching the formats used by credit card vendors. We can then use a method like Python's re.sub() to substitute each instance with another value.

Using Regex in Mezmo's Stream Editor

Log messages don't always appear perfectly formatted. This is why the Mezmo, formerly known as LogDNA, web app includes a stream editor feature that lets you change the formatting of your log data in real-time. You can use a regular expression as your search term, as well as toggle case sensitivity and global searching. This works similar to the sed command, while also formatting live log data.

Example:

Imagine you have an application that writes multiline logs to syslog. To avoid generating multiple syslog events from a single application event, the syslog service automatically escapes newline characters. This ensures each event only writes a single syslog message, but this makes the log stream appear cluttered and difficult to read. With Mezmo, we can use the search and replace feature to find and replace all instances of the escaped newline character with an actual newline character:

The "i" button toggles case sensitivity for the regular expression, while the "g" button toggles global or local matching. If global matching is disabled, only the first match in the stream is replaced. Clicking on the check mark performs the replace, and clicking on the "x" reverts it. Now, any current and new syslog messages will be displayed over multiple lines while leaving the actual log data untouched.

Conclusion

Despite being almost thirty years old, regex is still unfamiliar and esoteric territory for many developers. However, its flexibility and ubiquity make it a valuable addition to any developer's toolkit. If you want to learn more about regex or practice creating different expressions, sites like RegExr and regex101 provide interactive editors. Regular-Expressions.info also provides detailed tutorials, examples, and quick start guides.

Table of Contents

    Share Article

    RSS Feed

    Next blog post
    You're viewing our latest blog post.
    Previous blog post
    You're viewing our oldest blog post.
    Mezmo + Catchpoint deliver observability SREs can rely on
    Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)
    What is Active Telemetry
    Launching an agentic SRE for root cause analysis
    Paving the way for a new era: Mezmo's Active Telemetry
    The Answer to SRE Agent Failures: Context Engineering
    Empowering an MCP server with a telemetry pipeline
    The Debugging Bottleneck: A Manual Log-Sifting Expedition
    The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server
    Your New AI Assistant for a Smarter Workflow
    The Observability Problem Isn't Data Volume Anymore—It's Context
    Beyond the Pipeline: Data Isn't Oil, It's Power.
    The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace
    From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace
    Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility
    Architecting for Value: A Playbook for Sustainable Observability
    How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines
    Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo
    Introducing the New Mezmo Product Homepage
    The Inconvenient Truth About AI Ethics in Observability
    Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)
    Do you Grok It?
    Top Five Reasons Telemetry Pipelines Should Be on Every Engineer’s Radar
    Is It a Cup or a Pot? Helping You Pinpoint the Problem—and Sleep Through the Night
    Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos
    Why Datadog Falls Short for Log Management and What to Do Instead
    Telemetry for Modern Apps: Reducing MTTR with Smarter Signals
    Transforming Observability: Simpler, Smarter, and More Affordable Data Control
    Datadog: The Good, The Bad, The Costly
    Mezmo Recognized with 25 G2 Awards for Spring 2025
    Reducing Telemetry Toil with Rapid Pipelining
    Cut Costs, Not Insights:   A Practical Guide to Telemetry Data Optimization
    Webinar Recap: Telemetry Pipeline 101
    Petabyte Scale, Gigabyte Costs: Mezmo’s Evolution from ElasticSearch to Quickwit
    2024 Recap - Highlights of Mezmo’s product enhancements
    My Favorite Observability and DevOps Articles of 2024
    AWS re:Invent ‘24: Generative AI Observability, Platform Engineering, and 99.9995% Availability
    From Gartner IOCS 2024 Conference: AI, Observability Data, and Telemetry Pipelines
    Our team’s learnings from Kubecon: Use Exemplars, Configuring OTel, and OTTL cookbook
    How Mezmo Uses a Telemetry Pipeline to Handle Metrics, Part II
    Webinar Recap: 2024 DORA Report: Accelerate State of DevOps
    Kubecon ‘24 recap: Patent Trolls, OTel Lessons at Scale, and Principle Platform Abstractions
    Announcing Mezmo Flow: Build a Telemetry Pipeline in 15 minutes
    Key Takeaways from the 2024 DORA Report
    Webinar Recap | Telemetry Data Management: Tales from the Trenches
    What are SLOs/SLIs/SLAs?
    Webinar Recap | Next Gen Log Management: Maximize Log Value with Telemetry Pipelines
    Creating In-Stream Alerts for Telemetry Data
    Creating Re-Usable Components for Telemetry Pipelines
    Optimizing Data for Service Management Objective Monitoring
    More Value From Your Logs: Next Generation Log Management from Mezmo
    A Day in the Life of a Mezmo SRE
    Webinar Recap: Applying a Data Engineering Approach to Telemetry Data
    Dogfooding at Mezmo: How we used telemetry pipeline to reduce data volume
    Unlocking Business Insights with Telemetry Pipelines
    Why Your Telemetry (Observability) Pipelines Need to be Responsive
    How Data Profiling Can Reduce Burnout
    Data Optimization Technique: Route Data to Specialized Processing Chains
    Data Privacy Takeaways from Gartner Security & Risk Summit
    Mastering Telemetry Pipelines: Driving Compliance and Data Optimization
    A Recap of Gartner Security and Risk Summit: GenAI, Augmented Cybersecurity, Burnout
    Why Telemetry Pipelines Should Be A Part Of Your Compliance Strategy
    Pipeline Module: Event to Metric
    Telemetry Data Compliance Module
    OpenTelemetry: The Key To Unified Telemetry Data
    Data optimization technique: convert events to metrics
    What’s New With Mezmo: In-stream Alerting
    How Mezmo Used Telemetry Pipeline to Handle Metrics
    Webinar Recap: Mastering Telemetry Pipelines - A DevOps Lifecycle Approach to Data Management
    Open-source Telemetry Pipelines: An Overview
    SRECon Recap: Product Reliability, Burn Out, and more
    Webinar Recap: How to Manage Telemetry Data with Confidence
    Webinar Recap: Myths and Realities in Telemetry Data Handling
    Using Vector to Build a Telemetry Pipeline Solution
    Managing Telemetry Data Overflow in Kubernetes with Resource Quotas and Limits
    How To Optimize Telemetry Pipelines For Better Observability and Security
    Gartner IOCS Conference Recap: Monitoring and Observing Environments with Telemetry Pipelines
    AWS re:Invent 2023 highlights: Observability at Stripe, Capital One, and McDonald’s
    Webinar Recap: Best Practices for Observability Pipelines
    Introducing Responsive Pipelines from Mezmo
    My First KubeCon - Tales of the K8’s community, DE&I, sustainability, and OTel
    Modernize Telemetry Pipeline Management with Mezmo Pipeline as Code
    How To Profile and Optimize Telemetry Data: A Deep Dive
    Kubernetes Telemetry Data Optimization in Five Steps with Mezmo
    Introducing Mezmo Edge: A Secure Approach To Telemetry Data
    Understand Kubernetes Telemetry Data Immediately With Mezmo’s Welcome Pipeline
    Unearthing Gold: Deriving Metrics from Logs with Mezmo Telemetry Pipeline
    Webinar Recap: The Single Pane of Glass Myth
    Empower Observability Engineers: Enhance Engineering With Mezmo
    Webinar Recap: How to Get More Out of Your Log Data
    Unraveling the Log Data Explosion: New Market Research Shows Trends and Challenges
    Webinar Recap: Unlocking the Full Value of Telemetry Data
    Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors
    How To Configure The Mezmo Telemetry Pipeline
    Supercharge Elasticsearch Observability With Telemetry Pipelines
    Enhancing Grafana Observability With Telemetry Pipelines
    Optimizing Your Splunk Experience with Telemetry Pipelines
    Webinar Recap: Unlocking Business Performance with Telemetry Data
    Enhancing Datadog Observability with Telemetry Pipelines
    Transforming Your Data With Telemetry Pipelines