Edgewall Software
Modify

Opened 15 years ago

Closed 15 years ago

#355 closed defect (duplicate)

Invalid XML characters cause ParseError: not well-formed (invalid token)

Reported by: dbronner@… Owned by: dfraser
Priority: major Milestone: 0.6
Component: Build slave Version: 0.5.3
Keywords: Cc: mgood, osimons
Operating System: Linux

Description

Some of our test output contains characters like '\x01' (the single ascii character at code point 1), causing a Parse Error? when minidom tries to read in the XML file. This particular data isn't very useful, but I'd like to prevent tests from crashing the build just by printing something in their assertions.

There are a bunch of potential solutions:

  1. Base64 encode (and decode) all text data.
  2. Replace all illegal characters with one legal one (something like �).
  3. Something else.

Since the whole point of this text is for a human to read it, I think I'd prefer option 2.

Attachments (1)

example.xml (406 bytes) - added by dbronner@… 15 years ago.
Results file with illegal XML character

Download all attachments as: .zip

Change History (10)

comment:1 Changed 15 years ago by dfraser

Could you include an example output file, and a traceback of exactly where the problem occurs?

The proper solution is probably to encode them in XML in a way that can be parsed and displayed.

Changed 15 years ago by dbronner@…

Results file with illegal XML character

comment:2 Changed 15 years ago by dbronner@…

I'm not sure how to invoke build/pythontools outside the context of our continuous build, but the path leading to the problem is:

pythontools.unittest -> xmlio.parse -> minidom.parse

File "/path/to/Bitten-0.6dev-py2.4.egg/bitten/util/xmlio.py", line 185, in parse

raise Parse Error?(e)

bitten.util.xmlio.Parse Error?: not well-formed (invalid token): line 6, column 16

This is from parsing the results file generated by the test:

def test_foo():

assert False, u"\x01"

-Dave

comment:3 Changed 15 years ago by dfraser

  • Owner changed from cmlenz to dfraser
  • Status changed from new to assigned

I'll try have a stab at this if I get time. Just to clarify, the crash happens in the slave, right?

comment:4 Changed 15 years ago by dbronner@…

Correct.

comment:5 Changed 15 years ago by anonymous

This crash also happens on the master. The previous reported error happened as a result of the python:unittest command in the build recipe trying to form a unittest result to send back to the server, but this same kind of error happens server-side if another step in the recipe generates bad XML. The server-side error shows up as:

2009-06-04 11:36:32,221 Trac[master] ERROR: Error parsing build step result: not well-formed (invalid token): line 1, column 50604 Traceback (most recent call last):

File "/tmp/bitten-0.6.0-r638/lib/python2.4/site-packages/bitten/master.py", line 204, in _process_build_step File "/tmp/bitten-0.6.0-r638/lib/python2.4/site-packages/bitten/util/xmlio.py", line 195, in parse

Parse Error?: not well-formed (invalid token): line 1, column 50604

This was caused by a make file outputting the character "\x1b" and trying to send it back to the server in a message tag.

comment:6 Changed 15 years ago by dbronner

Sorry, I forgot to identify myself. The last comment was also from me (the original reporter).

comment:7 Changed 15 years ago by osimons

  • Cc osimons added

It would be very interesting if you could try the latest patch on #119 to see how that behaves with your problematic characters.

comment:8 Changed 15 years ago by dbronner

Looking through the comments in there, I came across this patch which does fix the problem: http://bitten.edgewall.org/attachment/ticket/243/bitten-escape-chars.patch

This has been in trunk for ages, so I'm not sure why I don't have it in my copy.

I'd close this as a (resolved) duplicate of #243, but I don't see the option to do that.

BTW... #119 is indeed related and the patch is along the lines of what's needed here. It still doesn't handle the low ascii values which are illegal XML though:

cgi.escape("\x01") == "\x01" "\x01".encode(sys.getfilesystemencoding(), 'replace') == "\x01"

comment:9 Changed 15 years ago by osimons

  • Resolution set to duplicate
  • Status changed from assigned to closed

Hmm. My #119 patch strips away that patch again as it can't be used as-is (current trunk only allows standard ascii through (string.printable)).

Reading up on it: http://www.w3.org/TR/xml11/#charsets

Seems I should reimplement this to strip away characters in the following ranges:

[#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F]

I'll add these notes to #119 seeing this is a duplicate of already solved issue, and I'll make sure to reimplement a way to strip these restricted characters from output in an updated patch. Would be OK if you keep an eye on that ticket as I'll likely get an updated patch done this evening.

Add Comment

Modify Ticket

Change Properties
Set your email in Preferences
Action
as closed The owner will remain dfraser.
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.