Parsing EDI with LINQ

Reading EDI files is simpler than it first looks. For a start, the file is broken up by two kinds of delimiters: ~ usually separates each segment, and * separates each element within a segment. Here's a sample EDI file for the 850 message (Purchase Order):

This sample is "unwrapped", meaning I've put a carriage-return/linefeed at the end of each segment so it's easier to read. A "wrapped" file would omit the CR/LF and just look like one long string. Segments begin with a segment ID (2-3 letter codes), and then one or more elements that contain data.

The first segment--called the Interchange Control Header (the line beginning with "ISA")--is always fixed-width. This is so you can discover what the delimiters are supposed to be for each file: character 104 is always the element delimiter, 105 is the sub-element delimiter, and 106 is the segment delimiter.

The EDI guide given to you by your trading partner will tell you what each segment is used for and what each element means to them. So already we can write a short piece of LINQ code to read an EDI file into a collection of segments and elements:

The above assumes that InputStream is an IO.Stream with the file you want to read. First we load that into a StreamReader so that we can fill a string with the entire contents of the file. Then we discover what the segment and element delimiters are (positions 105 and 103 in a zero-based array). The next step is to use LINQ to first break it up by segment (using Split(SegDelimiter)), and then create an anonymous type with the segment ID and an array of elements.

We call .ToArray() at the end of splitting the line into elements so that we can address them by index position later. In fact, the only reason I even did that is because I wanted to be clean and tidy and Skip(1) the first element that identifies the segment. We'd already put that into "SegID".

Now you have an IEnumerable full of anonymous types with the segment ID and its addressable elements. This might already be enough for your needs, but we'll assume that the data will be more useful if you could translate it into a hierarchy. Many shops use XML as their intermediate format, so we'll convert into that.

However, we don't want to merely wrap each segment and element up with angle brackets, we'd like to have some meaningful structure. EDI is a hierarchical format, much like XML, but the structure isn't explicit the way it is with XML. In this tutorial I'm going to model the implicit structure of EDI in XML and use it to drive a conversion algorithm. The below defines the structure of a common EDI message type, the 850 (Purchase Order) and will serve as a reusable configuration that we can tweak for other messages and trading partners:

The above divides the message into ranks and gives names for each segment and element. We'll use the ranks to control how the data is nested in the translated message, and the names will be used for the XML element names.

The best way to traverse a hierarchy is with a recursive function, and before we write that we'll take a moment to create a formal class to store our segments in.

It uses XStreamingElement to return an XML tree that's built dynamically from more LINQ queries. What you pass to it is an XElement containing our mapping configuration plus the collection of Segments we parsed from the EDI file.

The function will convert the segments into something that might look a bit like this: