Swiftwater Logo

Writing A Robust Little Parser in Swift –Part 1

This entry is part 12 of 17 in the series Swiftwater

ABSTRACT

Lately, I’ve been writing an ONVIF driver for Cocoa, which has helped me to see that Swift is actually an outstanding parser language.

I figured that this would be a good opportunity to show a bit about how we can leverage a couple of Swift traits that make it a remarkably good language for writing robust parsers. 

FIRST, THE SETUP

The Flow

If you look at the flowchart on the right, you’ll see a fairly typical dataflow for parsing communications.

First, you have the data coming in; usually as XML or JSON.

Next, you have some kind of “pre-parsing” engine, which handles converting the input data format to a data format that can be handled by applications written in Swift. A couple of examples could be the built-in Foundation class JSONSerialization, or the third-party framework SOAPEngine (what I use for pre-parsing ONVIF).

The pre-parsing engine will emit a Dictionary with String keys, and various types of values (for example, more Dictionaries, Arrays, or Strings).

Even though the pre-parsing engine has done an outstanding job of interpreting the input data, and handing it to us in a common format, there’s still plenty of work to be done.

We need to sort through a varied set of data, and generate a common format.

We will need to break it down further, until all the incoming data -regardless of its source- will be available in a single, common format.

That will require that we write yet another parser; one that takes in a Swift Dictionary, and emits the data in a form best suited to our needs (for example, as actual concrete class instances, Arrays or Dictionaries with well-defined structure).

For this exercise, I’ll assume that we are getting in XML data, and the resultant common data is a simple Array of String.

This will be both a simpler, and more complex task than most real-world parsers will need to accomplish.

It will be simpler, because we’ll contain our work to parsing just one simple data structure; an array of five strings.

It will be more complicated, because we will be sending in this array in a pretty crazy variety of ways.

Our parser will need to sort through all of these formats, and respond with the same data; no matter what garbage it’s fed.

THE DATA

We will assume that we have a “black box” pre-parsing engine that transforms XML into JSON, but let’s pretend that we don’t have control over which pre-parsing engine the consumer of our parser will use, so the intermediate Dictionary might have different structures for the same input.

We will describe the operation of the pre-parser by specifying the XML input data, and the varieties of JSON that can be emitted by it.

We won’t go over the full dataset in this article (but it will be available as comments in the playground). We’ll just cover one example here.

Let’s say that our pre-parser is presented with the following XML as input:

    <containerElement>
        <arrayElement>Element Value 01</arrayElement>
        <arrayElement>Element Value 02</arrayElement>
        <arrayElement>Element Value 03</arrayElement>
        <arrayElement>Element Value 04</arrayElement>
        <arrayElement>Element Value 05</arrayElement>
    </containerElement>

Depending upon the pre-parser chosen by the developer, we could have any one of these outputs (JSON):

    {"containerElement": [  {"arrayElement": "Element Value 01"},
                            {"arrayElement": "Element Value 02"},
                            {"arrayElement": "Element Value 03"},
                            {"arrayElement": "Element Value 04"},
                            {"arrayElement": "Element Value 05"}
                            ]
    }
    {"containerElement": {"arrayElement": [ "Element Value 01",
                                            "Element Value 02",
                                            "Element Value 03",
                                            "Element Value 04",
                                            "Element Value 05"
                                            ]
                        }
    }
    {"containerElement": [  {"arrayElement": {"value": "Element Value 01"}},
                            {"arrayElement": {"value": "Element Value 02"}},
                            {"arrayElement": {"value": "Element Value 03"}},
                            {"arrayElement": {"value": "Element Value 04"}},
                            {"arrayElement": {"value": "Element Value 05"}}
                            ]
    }

As you’ll see, when we get to writing the code, we’ve taken this variability to ludicrous extremes.

We will start to develop the parser in the next post. For the meantime, here’s a playground with a comment block, describing the inputs.

SAMPLE PLAYGROUND

Not much to see, so far. This is just a huge comment block, outlining the various inputs we’ll be testing.

/*
    <containerElement>
        <arrayElement>Element Value 01</arrayElement>
        <arrayElement>Element Value 02</arrayElement>
        <arrayElement>Element Value 03</arrayElement>
        <arrayElement>Element Value 04</arrayElement>
        <arrayElement>Element Value 05</arrayElement>
    </containerElement>

    <!--
        {"containerElement": [  {"arrayElement": "Element Value 01"},
                                {"arrayElement": "Element Value 02"},
                                {"arrayElement": "Element Value 03"},
                                {"arrayElement": "Element Value 04"},
                                {"arrayElement": "Element Value 05"}
                                ]
        }

 
        {"containerElement": {"arrayElement": [ "Element Value 01",
                                                "Element Value 02",
                                                "Element Value 03",
                                                "Element Value 04",
                                                "Element Value 05"
                                                ]
                            }
        }

        {"containerElement": [  {"arrayElement": {"value": "Element Value 01"}},
                                {"arrayElement": {"value": "Element Value 02"}},
                                {"arrayElement": {"value": "Element Value 03"}},
                                {"arrayElement": {"value": "Element Value 04"}},
                                {"arrayElement": {"value": "Element Value 05"}}
                                ]
        }
    -->

    <containerElement>
        <arrayElement>
            <value>Element Value 01</value>
            <value>Element Value 02</value>
            <value>Element Value 03</value>
            <value>Element Value 04</value>
            <value>Element Value 05</value>
        </arrayElement>
    </containerElement>

    <!--
        {"containerElement": {"arrayElement": [ {"value": "Element Value 01"},
                                                {"value": "Element Value 02"},
                                                {"value": "Element Value 03"},
                                                {"value": "Element Value 04"},
                                                {"value": "Element Value 05"}
                                                ]
                            }
        }
 
        {"containerElement": {"arrayElement": [ "Element Value 01",
                                                "Element Value 02",
                                                "Element Value 03",
                                                "Element Value 04",
                                                "Element Value 05"
                                                ]
                            }
        }
    -->

    <containerElement>
        <arrayElement value="Element Value 01"/>
        <arrayElement value="Element Value 02"/>
        <arrayElement value="Element Value 03"/>
        <arrayElement value="Element Value 04"/>
        <arrayElement value="Element Value 05"/>
    </containerElement>

    <!--
        {"containerElement": [  {"arrayElement": {"attributes": [{"value": "Element Value 01"}]}},
                                {"arrayElement": {"attributes": [{"value": "Element Value 02"}]}},
                                {"arrayElement": {"attributes": [{"value": "Element Value 03"}]}},
                                {"arrayElement": {"attributes": [{"value": "Element Value 04"}]}},
                                {"arrayElement": {"attributes": [{"value": "Element Value 05"}]}}
                                ]
        }

        {"containerElement": [  {"arrayElement": {"attributes": {"value": "Element Value 01"}}},
                                {"arrayElement": {"attributes": {"value": "Element Value 02"}}},
                                {"arrayElement": {"attributes": {"value": "Element Value 03"}}},
                                {"arrayElement": {"attributes": {"value": "Element Value 04"}}},
                                {"arrayElement": {"attributes": {"value": "Element Value 05"}}}
                                ]
        }
    -->

    <containerElement>
        <arrayElement index="0">Element Value 01</arrayElement>
        <arrayElement index="1">Element Value 02</arrayElement>
        <arrayElement index="2">Element Value 03</arrayElement>
        <arrayElement index="3">Element Value 04</arrayElement>
        <arrayElement index="4">Element Value 05</arrayElement>
    </containerElement>

    <!--
        {"containerElement": [  {"arrayElement": [{"attributes": [{"index": 0}]}, {"value": "Element Value 01"}]},
                                {"arrayElement": [{"attributes": [{"index": 1}]}, {"value": "Element Value 02"}]},
                                {"arrayElement": [{"attributes": [{"index": 2}]}, {"value": "Element Value 03"}]},
                                {"arrayElement": [{"attributes": [{"index": 3}]}, {"value": "Element Value 04"}]},
                                {"arrayElement": [{"attributes": [{"index": 4}]}, {"value": "Element Value 05"}]}
                                ]
        }

        {"containerElement": [  {"arrayElement": [{"attributes": [{"index": "0"}]}, {"value": "Element Value 01"}]},
                                {"arrayElement": [{"attributes": [{"index": "1"}]}, {"value": "Element Value 02"}]},
                                {"arrayElement": [{"attributes": [{"index": "2"}]}, {"value": "Element Value 03"}]},
                                {"arrayElement": [{"attributes": [{"index": "3"}]}, {"value": "Element Value 04"}]},
                                {"arrayElement": [{"attributes": [{"index": "4"}]}, {"value": "Element Value 05"}]}
                                ]
        }

        {"containerElement": [  {"arrayElement": [{"attributes": {"index": "0"}}, {"value": "Element Value 01"}]},
                                {"arrayElement": [{"attributes": {"index": "1"}}, {"value": "Element Value 02"}]},
                                {"arrayElement": [{"attributes": {"index": "2"}}, {"value": "Element Value 03"}]},
                                {"arrayElement": [{"attributes": {"index": "3"}}, {"value": "Element Value 04"}]},
                                {"arrayElement": [{"attributes": {"index": "4"}}, {"value": "Element Value 05"}]}
                                ]
        }

        {"containerElement": [  {"arrayElement": [{"attributes": [{"index": "0"}, {"value": "Element Value 01"}]}]},
                                {"arrayElement": [{"attributes": [{"index": "1"}, {"value": "Element Value 02"}]}]},
                                {"arrayElement": [{"attributes": [{"index": "2"}, {"value": "Element Value 03"}]}]},
                                {"arrayElement": [{"attributes": [{"index": "3"}, {"value": "Element Value 04"}]}]},
                                {"arrayElement": [{"attributes": [{"index": "4"}, {"value": "Element Value 05"}]}]}
                                ]
        }
    -->

    <containerElement>
        <arrayElement row="0" value="Element Value 01"/>
        <arrayElement row="1" value="Element Value 02"/>
        <arrayElement row="2" value="Element Value 03"/>
        <arrayElement row="3" value="Element Value 04"/>
        <arrayElement row="4" value="Element Value 05"/>
    </containerElement>

    <!--
        {"containerElement": [  {"arrayElement": {"attributes": [{"row": 0}, {"value": "Element Value 01"}]}},
                                {"arrayElement": {"attributes": [{"row": 1}, {"value": "Element Value 02"}]}},
                                {"arrayElement": {"attributes": [{"row": 2}, {"value": "Element Value 03"}]}},
                                {"arrayElement": {"attributes": [{"row": 3}, {"value": "Element Value 04"}]}},
                                {"arrayElement": {"attributes": [{"row": 4}, {"value": "Element Value 05"}]}}
                                ]
        }

        {"containerElement": [  {"arrayElement": [{"attributes": [{"row": 0}, {"value": "Element Value 01"}]}]},
                                {"arrayElement": [{"attributes": [{"row": 1}, {"value": "Element Value 02"}]}]},
                                {"arrayElement": [{"attributes": [{"row": 2}, {"value": "Element Value 03"}]}]},
                                {"arrayElement": [{"attributes": [{"row": 3}, {"value": "Element Value 04"}]}]},
                                {"arrayElement": [{"attributes": [{"row": 4}, {"value": "Element Value 05"}]}]}
                                ]
        }

        {"containerElement": [  {"arrayElement": {"attributes": [{"row": "0"}, {"value": "Element Value 01"}]}},
                                {"arrayElement": {"attributes": [{"row": "1"}, {"value": "Element Value 02"}]}},
                                {"arrayElement": {"attributes": [{"row": "2"}, {"value": "Element Value 03"}]}},
                                {"arrayElement": {"attributes": [{"row": "3"}, {"value": "Element Value 04"}]}},
                                {"arrayElement": {"attributes": [{"row": "4"}, {"value": "Element Value 05"}]}}
                                ]
        }

        {"containerElement": [  {"arrayElement": [{"attributes": [{"row": "0"}, {"value": "Element Value 01"}]}]},
                                {"arrayElement": [{"attributes": [{"row": "1"}, {"value": "Element Value 02"}]}]},
                                {"arrayElement": [{"attributes": [{"row": "2"}, {"value": "Element Value 03"}]}]},
                                {"arrayElement": [{"attributes": [{"row": "3"}, {"value": "Element Value 04"}]}]},
                                {"arrayElement": [{"attributes": [{"row": "4"}, {"value": "Element Value 05"}]}]}
                                ]
        }
    -->
 */