Introduction to The Data Format Description Language

[ MUSIC ] POWERS: Welcome to the developerWorks series on Getting Started with the Data Format Description Language, or DFDL, for short. You can visit the developerWorks DFDL page at ibm.biz/startdfdl. That's ibm.biz/startdfdl. This first video in the series is presented by Steve Hanson. He is the architect for the IBM DFDL component and he's also the co-chair of the Open Grid Forum DFDL Working Group. In this first video, Steve introduces you to the motivations and design goals for DFDL and shows you some of the key features of the specification. He also describes the features He also describes the features of IBM's DFDL component and discusses the Open Source implementation of DFDL called Daffodil. Finally, he gives you a quick guided tour of the IBM DFDL component and an example with IBM Integration Bus. Okay, here's Steve to take you through the presentation. HANSON: Good morning, good afternoon, good evening, everybody. Thank you for joining the call, and thanks to Guy for giving me another opportunity to push the word out about DFDL, so to speak. Okay, so we have a couple of different presentations on DFDL at the moment. We have the introduction presentation, which is, it's grown over time and it talks about all the new things that have happened in the past, so a few months in the past year and so. And that's what we're going to be talking about today. So, it's an introduction to DFDL but also an update on what's been happening. But then we also have a more advanced session which was based on a session I gave at Impact this year, which talks more about how to go about modeling data using DFDL. So, today, we're going to be doing the intro but if there's any demand for the more advanced session, then please let Guy know and we can see about shooting that a bit later in the year, perhaps. Okay, so we're going to talk a bit very briefly about why DFDL is useful and talk about DFDL as a standard, then a little bit about the IBM DFDL component that we've developed which is an implementation of the specification. And that's the component that is used by WMV v8, IIB v9 and a few other products as well these days. And then, with a Q&A at the end if we have time. Okay. So, why has this thing called Data Format Description Language -- or, DFDL for short -- come about? So, you look at much the data in the world, we have lots of XML data floating around. We have lots of JSON data these days. But still I would say the bulk of the data that is existing in the world and flowing through Broker and Integration Bus and so on is not XML or JSON, it's a mixture of text, binary, comes from COBOL programs, standards like, SWIFT and HL7 and that sort of thing. And for that sort of data, there is no open standard that describes it, all right. So every time someone wants to come along and create some sort of machine-readable description of a format, they pretty much have to come up with some sort of custom parser or some special language update that they've developed. All right, nothing has really existed out there for text and binary data. There have been a few attempts, things like ASN.1 if you've ever come across that. But that's really a prescriptive standard. It dictates the format on the wire that you have to put your data in, okay. And in fact, it doesn't [TONES] that then, hey, that's no good. So, there really hasn't been a way of taking open standard and saying, I've got some data here, all right. Or, this is the wire format of my data or I've got a specification written down in a PDF or something and then being able to produce a model for that which obeys an open standard and therefore can be shared by whoever needs to consume that. Until, that is, of course, DFDL came along, because that's the purpose of DFDL, it is to provide an open standard to describe text and binary data. And it's a big issue for IBM as well as the industry. Each time IBM acquires another company that is in the business of modeling and parsing text and binary data, guess what? They do it in their own way, and we have another issue with trying to integrate that model into the wider IBM world. And there's some examples on the slide there. The ones that we're familiar with, Broker's MRM message sets and TX Type Trees, to name just a couple. All right. So what's DFDL? So, the phrase I used for it, it's a universal, shareable, non-prescriptive description for general text and binary data formats. I know it's a bit of a mouthful, but that kind of says what it is. Let's have a look at DFDL itself. So, as I say, it's an open standard, okay, it's not proprietary, it's not owned by IBM, it's not owned by any particular company. It is an open standard from an organization you may not have heard of called the Open Grid Forum. The Open Grid Forum was an amalgamation of standards bodies that emerged about 10 years ago that before perhaps...and it was all concerned with the area of grid computing. And the reason that the DFDL standard came out of there was because people at the Open Grid Forum were wrestling with the problem of trying to develop APIs that could work on data spread across multiple machines across the grid. And for those [TONES] needed to understand the formats and therefore, they needed a way of defining those formats. They had XML schema for XML, that's fine, but for other text and binary data which is the majority of data, there wasn't anything they could use, and that's why they embarked upon this effort called DFDL. And that process started about 2004, that's when myself and IBM got involved right to the start of this. So, you know, it took quite a long time in order to get the standards to the point where it is now. So it's the version 1.0, it's a proposed recommendation which means it's been published and is waiting essentially verification by at least two implementations in order to make sure that the standard is indeed implementable and it works in practice. Okay. I will talk more about those implementations in a little bit. Now, DFDL is a way of describing data, all right, it's a modeling language and a set of rules for a processor by a parser and a serializer, right. It's not actually a data format itself, all right. You don't say my data is in DFDL format; you say my data is described by DFDL. The point is that your data format could be anything, all right, and you're using DFDL to describe it. It's a powerful language. We're all IBMers on the phone, so if I said to you, it's a bit like taking what you had with MRM and taking what you have with GX type trees, smashing it all together and doing a load more on top, then you kind of get some idea of the power that DFDL has in modeling text and binary data. All right, you can handle all the different things you're likely to come across. And the language is being designed to allow high performing implementation as well with deliberately not adopted features that cause that to break. And of course, it leaves the power in your hands as well. You can define whatever data format you like that's right for the job, use DFDL to process it. First, we have a look at the technology that's used to underpin DFDL. So, when the working group at OGS started out all that time ago, it had to decide on a language and whether it reinvent a new language or whether it would try and adopt some other practices that were going around. And if you look at really what DFDL has to do, it has to describe the logical structure of that business data you're going to model. And also, how that business data is laid out on the wire by its physical format. So it's try and describe its logical format and its physical format. Now, it was noticed that XML schema although designed for modeling XML also did a pretty good job of describing any logical data, okay. So the DFDL working group decided to use XML schema as the basis for that. Now, that might sound familiar to you; of course, MRM made a similar decision back in 2001 to adopt XML schema as the logical model as well, okay. So, that was kind of no accident. I think people will realize that schema is a good way of describing logical data. So that's what we get in DFDL. In fact, we used the subset of XML schema and a subset of its type system just to keep things a bit simple. We don't need things like attributes. We don't need some other things that substitution groups and other...some of the data types that you get in XML. So we had a subset of the schema and its types. So that gives us the logical model; how about the physical side? And that's where schema annotations come in, all right? If you've ever looked under the covers in the MRM, you'll see in those MXSD files XML schema with annotations, all right. That's exactly the same as DFDL, okay, that XML schema with annotations. The big difference is that with DFDL, those annotations are on an open standard, and that's the big difference between, say, DFDL and something like MRM, which is obviously IBM proprietary. And the result of that is a DFDL schema, all right. So, it describes some non-XML physical format. So the language is being designed to keep the simple cases simple but also to offer a lot of power if you need to use it. And the annotations, we try to make much more human readable than the old MRM limitations. And you can judge yourself in a moment when we have a look at some. And in conjunction with the DFDL schema, you need something called a DFDL Processor that something that can take the data and use schema to parse it and then vice versa. Take the logical information and then serialize it using the schema back into the wire format. All right, so, lots of intelligence in a DFDL Processor, and it can also do some validation as well. Now, on to Slide 7, we're going to have a look at an example of some text data and see what the DFDL schema for this looks like. And we've got here a simple delimited text data. You can see there's something called intval, something called fltval and you can see that the intval is a...it's a text integer and fltval is a text float. Now, as well as those data values in there, there's also some other things which get known by the general term of markup or syntax. When DFDL, they're called delimiters. All right. So we've got some couple of examples there. We can see that we have initiators and we have a separator that separates those two field values. okay? The last type of delimiter in DFDL is called a terminator. That is the data we've got. Now, we're going to try and put together a simple little DFDL schema to model that. Let's have a look at what it might look like. On to Slide 8. This slide is going to build up, so initially we can see what is in fact an XML schema, okay? So this is really defining the logical structure of that data. So, it's a complex type. [INAUDIBLE] my numbers which is the sequence of two elements, an integer and a flow, which I think describes a logical structure that they do, okay. Now, how this DFDL fit into the picture? Well, [INAUDIBLE] like that. You can see in that I've added in some DFDL annotations, all right. They're [apt and flow] annotations and within them, you can see there's something called DFDL element and a whole bunch of things called DFDL properties. And those properties are the things that describe the physical format of that data. And so you can see for example, we have things that say representation text, encoding ascii, lengthKind delimited which means we're going to...it's a variable, that data. We're going to scan and try to find how long this was searching for delimiters. And you can see we have initiators in there as well. And something called textNumber Pattern to describe the numeric layout of that data. Okay, so that pretty much describes all that data you saw there apart from one thing, and that one thing is a separator. A separator is a property of sequence, because it's something that happens between the elements, is actually owned by the sequence. You can see there we have a DFDL sequence annotation, and that's where the separator and its encoding is specified. Okay. Now, I said that was supposed to be human readable. It's currently human readable. If you have, you know, for a few people in the world that can read that sort of thing. So what we've done is we've got an alternative syntax. So on Slide 9, you can see we have something called a short form DFDL schema. So, those annotations that we saw before, that properties on those was actually being moved and put on the schema objects themselves. That makes it much more compact and I think much more readable as well. And that's the format you'll tend to see when people write schemas. They are exactly equivalent. Okay, we talked about DFDL Processor. All right, so we have our schema, what does the processor do? The processor, it takes some data. It's going to parse it, all right, going to read that data stream. It's going to use the schema to understand it and that's going to create something called a DFDL infoset. Now, if you're familiar with the XML infoset, that's something you'd understand. It's essentially just a logical set of information, a bit like an XML DOM, that kind of thing. So, the DFL spec doesn't actually say what form that must take; it just says what it must contain. I've shown it in the slide as being rendered an XML; it could have been rendered in any way. And the serializer processor is the opposite. It takes the infoset, uses the schema and then writes the same data stream back again. That's essentially a DFDL Processor, right? So we have the schema and we have the processor. So, Slide 11 shows you all the different features of the language. I'll just skim through these very quickly. You can support all sorts of different text data types and binary data types, fixed length and variable length of data, even business intelligence-directional. We can go right down to the level of bits so we can start to model down at the bit level not just by some characters. We can have ordered and unordered content, default values, nil values. Arrays that are either fixed or variable in length. A very powerful feature of the language is that it's actually got XPath 2.0 expressions built into it, which enables you to do things like say, okay, I'm an array and the number of currencies in the array depends on a field earlier in the data. And you simply use an XPath expression to point back at that earlier field in the data. It's a very, very powerful feature of the language. Another powerful feature is something called speculative parsing. So, this means that the parser will try and resolve choices in optional content by kind of multiple parses. So for example, if you have a choice and you've got three different branches to the choice, the parser will by default try and parse the first branch. If that doesn't work, it will then backtrack and it will try the second. Similarly, for optional fields, it will try and parse an optional field. If it fails, they'll say, ah, well, that wasn't there. Well, then go on to the next thing. All right, so it tries to stay on its feet much more so than the MRM parser could ever do. Validation is to XML schema 1.0 rules. Basically we're an XML schema, we have all the validation rules there, so why not validate? We have a scoping mechanism which enables you to define common property values and then apply those property values at multiple points across the schema. So, if you think about the MRM, we had that thing called a measure set file, right, there were some properties that you could specify at the message set level that applied all over the rest of the message set. This is a very, very similar scheme, all right, it means that you declare blocks of properties and you can reuse them wherever you want. So, a question that's very commonly asked is when shall I use DFDL? Okay, I've said it's a general purpose text and binary modeling and parsing tool, but what is DFDL's actual real sweet spot? When should you really use it? So, the answer to that is if you've got a specification of the data format on the wire, all right. So, for example, you have a spreadsheet or you've got a PDF or a Word document that's actually [tending] you to allow that data. And or all you've also got wire examples of that data format as well. All right, for actual examples of the data and you need to model those, all right? So it's very much driven by what the data looks like on the wire. So, DFDL can be used to model the sort of the data you get out of COBOL, or C or Assembler programs, any kind of text data with delimiters such as comma separated values or anything beyond that, you know, things like standards for industry like SWIFT and HL7, EDIFACT, X12, et cetera. And also, binary standards as well, things like ISO8583 with those bitmaps at the start, and TLog which is a data you get from 4690 Point of Sale. Okay, so all those kind of things use DFDL to model. I won't, don't recommend, though, that you use DFDL to model XML, right? We already have XML parsers and we have XML schemas that describe the structure and you can validate [INAUDIBLE]. There's no need to try and model XML itself using DFDL. JSON is interesting, right? We already have JSON parsers and the JSON parsers in the WMB and [IIAB], for example. All right. So, the recommendation is don't really use DFDL for modeling JSON; use the JSON parser. The next question that comes up with JSON is, well, I need a model because I want to use it with graphical data mapper, for example. Well, there is an initiative in the industry to design JSON schema and that's the long-term direction of that. Meantime, the best way of handling that is to deposit the data with JSON and then transform that to XML and use the XML instance to derive a pure XML schema and then use that with the mapper. That's a better approach than using DFDL to model it. And the other thing that I don't recommend DFDL for is some of these formats like Google Protocol Buffers or HDF5, all right. These are quite complex serialization formats. But the key point about them is as a user, you never look at the wire format. Right? You always interact with the data via APIs. So with GPB, for example, you define the logical structure of data and in a single [adopt prota] file and then you use APIs which exist in loads of different languages to getting in that data. What GPB does with the data when it puts you on the wire is not really of interest to you. You shouldn't really go poking around in there. All right. And for those kind of formats, we recommend as well, don't use DFDL. So, really DFDL is about data on the wire. You put that wire format and that's what you need to model and use DFDL. And the last thing is that DFDL does, as I say, have this path expression language in it. But we don't recommend that you use that for implementing sort of complex validation rules. That's not really what DFDL expressions are designed for, and it's kind of pushing the boundaries of DFDL a bit. All right, so DFDL adoption. So, there's an IBM DFDL processor and there's also an open source DFDL processor called Daffodil. And as I said at the beginning, we need at least two implementations of the spec in order to fully ratify the standards and move it through to what's called full recommendation status. And so, these are the two that we currently have. So, the IBM component -- I'll talk more about what that contains in a moment -- but that's currently used in Broker v8, Integration Bus 9. It's used in the Rational Integration Tester suite of tools as well; that's the old Green Hat products that IBM acquired, they now ship with DFDL. And as of 2Q this year, InfoSphere MDM v11 also ships with DFDL. The open source Daffodil parser, that's available as an alpha release at the moment. It is only a parser. There's no serializer. But it's there and coming together, is good news. And something that is very interesting. The next slide, Slide 14, what we're trying to do to get DFDL out there in the...sort of the mouths of babes and sucklings, if you want, is we started a web community called DFDL schemas. Now, this is a free public repository for DFDL models that have a good degree of reuse. All right, it's hosted on GitHub, which is the really popular website doing open source development. And it's free to read that you have to register if you want to collaborate on this. IBM has started to put some content up there. I can see content we have is on the righthand side. We have HL7 v2.7 schemas, ISO8583 1987 schemas and 4690 TLOG ace schemas up there so far. All right, and more things will appear. All right, just to point out that the schemas you get up there are unsupported by IBM. All right, they are for use by anybody who wants to but it's not the same as acquiring it as part of, say, a connectivity pack or something like that. I'll talk more about that a bit later on. Okay, to get started with DFDL, the easiest thing to do probably is just to put DFDL into your favorite search engine and pretty high up on the hit list you get the DFDL Wikipedia page. And that has a good load of links that take you off to the specification at OGS and other things like that. Okay, so, that's some OGS DFDL, the standard. So, now let's take a look at the actual IBM DFDL component. So this was designed -- so, this is slide 18. This was designed right from the outset as an embeddable component. Although it's produced by effectively the same development team as produces Broker, you know, integration bus and that. I mean, it was designed as an embeddable component for use by as many products in IBM that want to take it. Okay, it was first shipped in 2011 at v1.0 and it went out at v1.1. So, it consists of a DFDL processor, not surprisingly, right. We have a parser and a serializer. In fact, we have two languages for our parser and serializer: we have Java and we have C. And that's really because the different consumers of IBM DFDL need those two languages -- so, for example Broker uses the C parser but MDM uses Java. It's a streaming parser, which means it could take in large files of data, so, you just keep feeding it [buffers] of data and it keeps working its way through them. That makes it useful for reading big files or teaching TC/PIP. It's an on demand parser, which means it will only parse as much as you ask. So if you're only interested in looking for first hundred bytes of one meg data buffer, then we'd only parse first hundred bytes. And it's speculative, as I described before. So it tries ia and tries different alternatives when it comes across things like choices and optional fields. To get the performance we actually pre-compile the DFDL schema into a more compact form which we call the Grammar but that's all completely hidden from you; you never actually see that. And the parse itself emits SAX-like events. So, it's a bit like an XML SAX parser. You get all these events fired at you, you write a callback method and then you can do what you want with those. We have some really nice tooling, Eclipse based tooling for this. There's a nice schema editor; we'll see it shortly. We have some wizards that help you get started with common formats and some importers as well for COBOL and C. And the neatest bit -- and I'll show you this later as well -- is the ability to debug your DFDL schema using real data all within the tooling itself. So, IBM DFDL, how much of the DFDL specification does that implement? Okay, the specification is pretty large, it's well over 200 pages now. We implement probably about 85 percent of the spec, I would say. There are still a few features that we don't do. And there are some spec [erata] as well that we haven't yet implemented, but most of the stuff is there. We will get to a 100 percent complete; it will just take a bit of time to get there, probably a year or so I would think. But eventually we will have an IBM DFDL which is 100 percent of the specification. All right, Slide 19. What's new recently in IBM DFDL? So, if you go back to 1.0, that shipped in Broker version 8 and since then we moved from a few different point releases and we're now at V 1.1. So, some of the things we've added since 1.0: the ability to extract data using prefixed lengths or regular expressions. It can handle binary data with delimiters. You can use...define your own variables; it's very useful for the use of the expression language. We fully implement default values when serializing. And we don't add, [INAUDIBLE] you don't get to implement all the XPath functions but we've added some more in. There's some nice features added to the tooling as well in 1.1. We've got keyboard shortcuts in the editor. We've got multi-byte encoding support in the DFDL debugger. Lots of copy and paste has been added, and we can generate sample values as well. And each time we bring out an important release we have a continual increase in the performance of IBM DFDL as well. Next on the list. To handle unordered sequences, that's something we don't yet support but we need to do for standards like fix. And also something called direct dispatch for choices -- that means we couldn't, rather than try each branch of a choice one after the other, if we know what the branch is because there's some sort of indicator earlier in the data, we can just jump straight to it. And the specification has...doesn't allow for that, but that's something we want to implement as well. That's really useful for something that has lots of, lots of kind of head of body trailer content, and that body can be one of many, something like SWIFT. Okay, so, let's now have a look at what DFDL looks like in integration bus. So, I'm going to use integration bus as the term here, but I plot WMB and IIB is really what I think applies both the v8 and v9. Those embedded in WMB v8 and IB V9 and it manifests itself in the way you might expect, as a domain and an associated parser, just the same way that you have MRM, XMM, C, MIME, SOAP, data object, et cetera, we now have DFDL domain. So, if we go along to input node and click on the message domain dropdown, you'll see DFDL right up there at the top. And the only reason that you would use it in IIB, instead of using MRM, CWF and TDS. All right, those are the two things that it replaces. Now, having said that, I use the term replace, the MRM, [INAUDIBLE] aren't going away. They're still there in v8, they're still there in v9, and they will continue to be there. Do not deprecate it; they are first-class artifact still. But it's fair to say that they won't get any further enhancements. So DFDL schema files, they reside in libraries. They don't reside in message sets. And we didn't really see the point of trying to take DFDL schema and managing them in some way to fit the message set model. So the message set is kind of really now for use with MRM. If you're modeling text and binary data going forward with DFDL, that stuff goes in libraries. So, the toolkit for Broker and IIB, that includes the wizards and the others that I talked about before. And it includes the model debugger as well. Now, if you remember trying to model some data using MRM, you create your message set and you create your schemas, your XSD files in the message set. And then you think, okay, I need to test against some data. Add that to a BAR file. You create a flow. Deploy it all across the Broker. You try and stick some data through there which wouldn't work. So you think, okay, how do I debug this? So, you put on the user trace and you'd have to format the user trace and get it back again and try and work out what on earth is happening. All right, that's all history with DFDL, because we have this model debugger which lives in the toolkit. So you don't need a message flow. You don't need a BAR file and you don't need to deploy it to the run time. You can do it all within the tooling, and I'll show you that shortly. The DFDL schema itself gets deployed in the BAR file. There's no such thing as a dictionary file, okay? I talked about this grammar that we generate in order to increase the performance; that's fine, but that all takes place on the Broker. You never see that grammar. And finally, there isn't any automatic migration from MRM, all right? Because MRM is not going away, that hasn't been a priority item for us. If enough customers ask for it, then I guess it's something that will end up being done, but we have plenty of other stuff to do at the moment, completing the specification and improving the performance of the tooling and everything. All right, so we're now going to look through the wizards and the editor and he debugger with some animations I've built up. So I'll go through these fairly quickly. So, when you're using the Broker toolkit, he integration bus toolkit, you'll see there's something called the new message model wizard, all right. That's the way that you create XML schemas, DFDL schemas, WSDLs, schemas for the WebSphere Adapters, all right, in libraries, all right? This is the kind of placement you want for message sets. Message set's still there and the way forward is using new message model wizard, okay? And that's the dialogue that pops up. There's a section there called Text and Binary. If you pick any of those radio buttons, the CSV, record-oriented text, COBOL, C, et cetera, right, you will end up with a DFDL schema, all right. That's what that section is going to create you, all right. Alternatively, you might already have a DFDL schema in which case you can just drop it straight to a library. Let's have chosen one of those options and typically you get offered one of three ways of creating the schema. You need to create an empty schema and then use the other tool to build it up. You can use what's called a guided authoring, which is going to prompt you for other questions and try to generate as much of that schema as it can. Or, if you've got things like COBOL and C [cookbooks] then you can actually import those and get the schema created for you automatically. So, an example of the guided authoring wizard this is for CSV so you get asked a whole bunch of questions, things like how many fields, how many columns you've got in your data, is your first record a header, what's your end of record character, the character in live feed or is it just live feed for example. Are you using this sort of the escape schema that CSV normally uses which is where it puts double quotes around fields that contain a comma and things like that. So, you answer all the questions. You click okay, and then you get generated your DFDL schema. Now, it's partly complete because, for example, you might want to rename the fields from the field one to field two and so on to the fields names you want. And you might want to change on the data types and the string to something else. But by and large, you have something you can work with. That's a nice way of getting into the editor, so let's have a quick look at the editor now. When you open the editor up, here it's opened up on a message which is called CompanyTaggedDelimited. What you're going to see in front of you in the main bit of the editor is the logical structure of the data, all right? Now at the moment, there's nothing DFDL at all about that. That all that you're seeing there is the XML schema subset that DFDL uses for its logical data. So, you can see elements, you can sequences, you can see types, [mini curves, *** curves], all right? Those are all things out of pure XML schema. The DFDL stuff comes on the righthand side. If you select any of those objects in that schema such as the EmpName element there, on the righthand side, you will see all the DFDL properties that you can use to describe that particular object. All right, so you can see here things like encoding, byte order, representation, length kind, some of those properties that we saw in that DFDL example earlier on. Now, when you first start up the editor, you'd be put into something called basic mode where you're shown a sort of subset of the properties. The button at the top there that says Show advanced and that shows you all the properties that you may ever need to set on the object. My recommendation is that you go straight to the advanced mode, right, because we have had examples of users being in basic mode and thinking that they couldn't see a property and thinking that IBM DFDL just didn't support that property yet, right. So, my advice is to move straight to the advanced mode. So, once you've set all the property values that you need on your schema, you then save the file, and in the usual way your Eclipse builds the runs and it's going to validate that that particular schema is correct. Now, there are a lot of validation rules that can apply to DFDL schema all based on the rules in the specification. So, if anything is wrong, what will happen is you'll get the usual red cross appearing on the object in error, all right. And if we can pin it down to a particular property, then you'll see that property error as well and there will be an error message at the bottom in the problems viewed, right. That's pretty standard stuff. So, when you've got that schema all sorted out and you've got no more errors, and you think you're ready to do some testing. The next thing you do -- and we're now on Slide 26 -- then you click on test powers model at the top left, all right? And that is going to run DFDL parser inside the toolkit to test your data. In fact, what it's going to do is you're going to run the IBM DFDL Java parser, right, because obviously you still get this all in Java. So, although this is running the Java and the Broker run time if you're in the C, right, you get the same behavior from both. The two, the C in the parser, C in the Java are kept in lockstep together. Whenever we fix anything in one, we fix it in the other. They are entirely in lock step. So, if you click on that test powers model button, on the righthand side, you'll see a dialogue that pops up which is asking you for the root message to use and it's asking you for an input file of data. Fill in that information, click okay, and you'll see that what happens is at the bottom of the screen there, that data file has been loaded in, all right? And that's your data waiting to be parsed. Little green may have actually...the thing that actually gets the parser going. So, you click on that and the parser is going to run. And it's going to do its best to parse that data using the schema. Now, in this example here, you'll see that parsing was successful which is good. And as this balloon pops up tells you so. What you can see here is that top right, you'll see that infoset I talked about before -- i.e. the logical data, all right? So you can see up there. Yes, it's got all those instances about repeating employee structure and it's populated the Empflow, Dept and Name and Address, all those things there in that data. Now, you can look upon that as being the equivalent of the body of the Message Broker tree, all right. Obviously, there are no MQ headers or anything like that in here, so it's not for Broker tree; it's just essentially the body, all right. It's the DFDL infoset. Now, you can also see what's happened down below. There's a whole lot of coloring that's going on in that data. And what that is, is that's the DFDL parser is highlighting delimiters in the data, right. So, at the moment, it has blue for initiators, it has orange for terminators and pink for separators. You can configure those colors if you like. But that gives you a good idea that the parser is understanding the data correctly. That all works very nicely. If your data isn't just text but in fact it's something coming from a COBOL program, then we have a hex view which enables you to look at the actual hex data in hex as well. And again, you will get that marked up and delimiters marked up. Now, what happens if something went wrong? So we're on to Slide 27. If something went wrong and the parser was unable to parse the data, you will get a balloon up that says DFDL parser error. It'll tell you the final error that the parser got, okay. And it'll give you the infoset after the error. And it'll show you the parsed data up to that point as well. And it'll stop, you know, there's a red mark on line four there if you can just about see that, okay? Also, it gives you the object in error in the top left as well in the main window. There's a link that says status, all right. That's really useful to click on, because what that does is it opens up something called the trace console, right, the DFDL test tracer view. Now, the DFDL parser and the serializer have really verbose user tracing and let that cover everything that they do. It's way better than the MRM has ever had, okay. And the idea is that if you got an error, bring up the trace, go back to the start and then work your way through trying to find out what you think has gone wrong and you should be able to find it out from that trace view. One important thing to bear in mind is that because the DFDL parser is speculating... The final error you get out may not be the original cause of the error because the parser may have tried to parse [INAUDIBLE] for example, and not succeeding for some reason, then gone back and tried another branch and so forth. So you need to go back to the beginning and work your way through to find out the initial cause of the error. All right, so that's what happens when the test fails. So that's the debugger, really, really useful. And you can see we never went anywhere near a Broker run time. No message flows or anything. You're entirely within the DFDL and editor in fact. And one last point on here is that you can see lots of underlining in the trace there. That's because if you click on that you get taken straight to the object that's described by that particular trace message. So, the model and the data are linked together. Okay, so you have debugged your model, it's all working happily. Now you want to use it in a message flow for real, so you're going to deploy your DFDL schema. You're going to create a flow. As you saw before, you're probably going to specify an input node. You want to use a DFDL domain. And the way you do it is you set the DFDL domain there on the message domain property and you specify the message name under message, okay? You don't specify the name of the library or the name of the schema, all right, it gets that essentially from the context. So, in other words, it will find that message in the application that you're using if you're using applications or it will find it in the generic execution group if you're not using applications. All right, so you supplied DFDL as a domain and we specify the message name. And also, specify the parse timing, it's on demand or complete in the same way as you can with the other parsers in integration bus. So, you can also switch on validation as well using the validation tab, okay. But essentially, it behaves pretty much the same as any of the other parsers. At the moment, there are no specific DFDL parser options, okay. I'm sure some will get added over time, but there's nothing specific there at the moment. We're going to look at the message tree. So, here's a trace from the Broker message tree where we simply got our properties folder and we've got a DFDL on the body. And so you can see that the message name is appearing in the message type field there. Message set and then message format is blank, they're not used in DFDL domain because the body is owned by the DFDL domain. And you can see here a difference between DFDL and MRM, all right. The message name appears in the tree in the same way that it does for XML and C, all right. That was quite specifically done like that and to make it more compatible with the way the XMLNSC works. Other things to note in there is that the syntax elements are like compact Name/Value style like what you get with XMLNSC as well. And the data types that you see are the data types, the SQL types that map through the equivalent types in the DFDL schema. So you'll integers, characters, time stamps, decimals that sort of thing in your tree. Okay, so that's using IBM DFDL inside the Broker. What if you want to use it outside a Broker? Okay, well, it turns out that you can do that. The IBM DFDL Java classes can be used to create standalone Java apps the use DFDL, all right? What you have to do is you have to get hold of WNB v8 or IIB 9 and install that and then you can copy those classes to ever you want to use them, all right, that maybe you can use them on the same computer, you can use them on a remote computer as well. What happens when you copy that, there's a particular zip that you have to take, and when you unzip that file, out pops the IBM DFDL for Java JAR files, [a load of Javadoc], sample program but also an ITLM licensed file. All right, so when ITLM runs and does an audit of the system, what it finds is what it thinks is a Broker. And so you effectively get charged for having run a Broker on that machine, all right. So, the idea was that you would use Broker Express. That's the edition you would use, and that's a way of using IBM DFDL outside of Broker. And that's really the only way you could do it outside of an IBM product at the moment, you have to get Broker and then copy it out elsewhere. Now, we talked a little about DFDL schemas. You're on...that's the GitHub website where DFDL schemas are hosted. What we're going to do now is just describe the industry formats that we model using DFDL and how you can get hold of them. So, in the usual manner, you'll get fully-supported full-function DFDL schemas appearing as part of IBM products. So, for example, you might get them shipped as part of integration bus; or more likely, you'll see them appear as part of Connectivity pack for integration bus such as the health care one or the retail one. However, wherever we can, you know, through legal reasons and whatever, we will try and put DFDL schemas available on that GitHub website right. They're unsupported, they probably won't be entirely full function; they'll be part function. I'll describe what that means in the moment. But they'll be there and you'll be able to get hold of those ahead of getting hold of them from the official IBM product, all right? So, we have HL7 schemas for v2.5.1, v2.6 and v2.7, right? They're available in the Connectivity Pack for Healthcare, right. On GitHub, you can also get hold of the schemas but we've going to put the v2.7 ones out there and we've only put out there what's called the generic HL7 model, all right? In the full set of schemas you get a specific model for each different type of HL7 message and the generic model that will handle anything, right. We've just put the generic model up on GitHub. Okay, that's kind of what I mean when I say part function. In terms of the 4690 TLog format, all right, we've got them the ACE schemas on GitHub at the moment. They will come out as part of the projected connectivity pack for retail. At some point in 4Q is the plan, okay. And at some point as well we like also to get the other two variance that you have with TLog that's called GSA and SA, get them into the Connectivity Pack as well. A very commonly used format is ISO 8583, and in fact that's one of the first ones we tried with DFDL because it's something the MRM just can't model. So that's actually shipped inside Broker as a sample; it's also there on GitHub. We have the 1987 version of that standard. There's also a 1993 version of the standard, we're trying to get that onto GitHub as soon as we can as well. There's more that's we're going to follow. We're looking at some of the Edifact schemas as well and [nathca], too. So, just kind of watch the space on GitHub for new things appearing. Okay, so that's was everything that I had to talk about formally. POWERS: That concludes this introductory video. Don't forget, you can watch other videos on our Getting Started with DFDL series and get links to many more DFDL related resources at ibm.biz/startdfdl. [ MUSIC ]