Xml files

From GeopsyWiki
Jump to: navigation, search

Introduction

Many files exported and imported in Geopsy softwares are based on XML structures. For instance, (listed with their common extension):

  • page
  • mkup
  • layer
  • cpanel
  • ctparser
  • gpy
  • dinver
  • param
  • target

Decompression

Those files are compressed TAR archives containing at least one file called contents.xml.

tar xvfz Legend_page.page

Note that option 'z' might not be necessary if your browser recognizes page files as compressed gz files and decompresses it silently. Anyhow it produces a file contents.xml. Some files may also contain binary files named bin_data_1*. The format of those files is currently undocumented, refer to the source code for details.

Visualization

contents.xml can be viewed by any Internet browser (e.g. Firefox).

<SciFigs>
  <libVersion>2.3.0</libVersion>
  <type>Page</type>
  <GraphicSheet>
    <LegendWidget objectName="object">
      <objectName>object</objectName>
      <printX>1</printX>
      <printY>1</printY>
      <anchor>TopLeft</anchor>
...

Edition

The encoding of contenst.xml is UTF-16 which may not be directly editable in a text editor that does not support Unicode. However most modern editors does:

  • Notepad++ (Windows only): UTF-16 is automatically recognized.
  • Vim: on most systems, UTF-16 is automatically recognized. If not follow instructions below.
  • Kate or KWrite: UTF-16 is not automatically recognized up to version 4.4 (not tested above), but it can be specified manually in the menu (Tools/Encoding/Unicode/UTF-16) or through the command line
 kate contents.xml --encoding UTF-16
 kwrite contents.xml --encoding UTF-16
  • nano: UTF-16 might be supported, but not successful on a platform with LC_ALL=en_US.UTF8 (version 2.2.4).
  • Notepad and Wordpad (Windows only): UTF-16 is not supported. Follow instructions below.

ASCII Conversion

Converting to ASCII is useful if you do not have an editor that support UTF-16 or if Bash commands (e.g. grep, awk, sed,...) does not support UTF-16. iconv is used for encoding conversions:

iconv -f UTF-16 -t ASCII contents.xml > tmp; mv tmp contents.xml

Any special character (non US characters) are lost in this transformation. This is important only for titles and texts displayed for instance in page files, if another language than English is used. Note that playing with contents.xml does not alter the original file, all modifications can still be erased.

Now contents.xml can be manipulated as an ASCII file.

Saving modifications or compression

To pack back contents.xml to the original compressed file format, the complete archive must reconstructed:

 tar cvfz Legend_page.page contents.xml

In this case, UTF-16, UTF-8 or ASCII are accepted. If binary files were present in the original file, they must be packed together. The order is not critical.

 tar cvfz Legend_page.page contents.xml bin_data_*