OUTPUT with XML option can generate invalid XML with complex yet valid XPATH

Description

The OUTPUT action using the XML option is documented as follows:

XML - Specifies the file is output as XML data with the name of each field in the format becoming the XML tag for that field's data.

Consider the following field definition:

Layout := record
string20 article_id {xpath('/Row/article/@id')} ;
string1000 content {xpath('description/content')};
END;

If I were to use OUTPUT to write this XML:

ds:=dataset('~forum::xmlread',layout,xml('Row/article'));
output(ds,,'~xmlOut',xml,overwrite);

The tag generated for the content field is <description/content> which throws a error when trying to read the file.
I would expect to see <content> according to the documentation.

If I use embedded datasets to reduce the XPATH to a single reference, it works correctly.

But shouldn’t the OUTPUT action use the actual field name instead of the XPATH information?

In summary, OUTPUT using the XML option should ignore complex xpaths.
I think it’s supposed to ignore complex xpaths on output and simply use the field name. In any case it should never create invalid xml.
Only the complex xpaths should be ignored, not the simple ones.

Note: using NOXPATH will eliminate the bad tag generation, but you need to rewrite the RECORD to a simplified layout to read the new file again.

See the following Forum post for additional discussion on this issue:

http://hpccsystems.com/bb/viewtopic.php?f=10&t=1610&sid=35f59a57694f45838ab8685288b1d918

Conclusion

None

Activity

Show:

Richard Chapman April 20, 2015 at 10:16 AM

this may need to go into the red book

Bob Foreman April 17, 2015 at 11:18 PM

Thanks Tony, do you think Jim should add a blurb to the documentation?

Anthony Fishbeck April 17, 2015 at 8:13 PM

The fix I am submitting will make sure these situations don't output invalid xml, but it won't make writing and reading symetrical. I.e. if you use a record layout like this to write a file, the same layout wont read it. Building out the full xpath on fields like this would usually result in pretty non-sensical xml that would lead to mostly unintended results.

For example

Layout := record
string20 article_id {xpath('article/page/@id')} ;
string1000 content {xpath('article/description/content')};
END;

Could read in several different xml formats and we wouldn't know which one to write out.

So we won't write out bad xml, but we also wont try to guess an inexact format.

So after my fix if we read in either:
<Row><article><description><content>abc</content></description><page id="101">xyz</page></article></Row>
or
<Row><article><description><content>abc</content></description></article><article><page id="101">xyz</page></article></Row>
we will write out valid xml, but simplified to:

<Row><article_id>101</article_id><content>abc</content></Row>

Which can't be read back in the same record format because of the xpaths.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Components

Assignee

Reporter

Priority

Fix versions

Labels

Pull Request URL

Affects versions

Created March 2, 2015 at 8:38 PM
Updated May 13, 2015 at 8:28 AM
Resolved May 13, 2015 at 8:28 AM