Table ParseColumn.pdf

From Array Suite Wiki
(Redirected from ParseColumn.txt)

Parse Column


The Parse Column command in Array Studio allows the user to parse (split) a column, using either an expression, or most easily, a pattern. It works similar to the Text to Columns function in Excel, which can split a single column of text into multiple columns. For instance, the group column, shown below, contains treatment and time, separated by a period. Using the Parse Column function, the user can separate treatment and time into two new columns extremely easily (which is demonstrated below). This function can be accessed by going to Table | Columns | Parse Column.

Table ParseColumn Menu.png

Input Data Requirements

This command works on all Table objects, including Design Tables and Annotation Tables.

Step 1: Select the source Table

The user will first be asked to choose the table with the column to parse, then the Parse Column window will open.


Step 2: Choose Column

The "Parse Column" window contains a large box on the left side that contains every column in the Table. Select the column that needs parsing in that box.


Step 3: Specify how to parse the Column

In the Input the pattern to extract columns box enter the pattern of the column delimiter. In the example below, the user entered a . (period) to parse the column. Click GO to preview the new columns. The user can edit any column header or cell by double clicking on the cell. Columns can also be selected by single clicking on a column header. The Export selected columns only checkbox can be used if the user is only interested in parsing a particular column. The Set Insert Position option can be used to set the position of the new column(s) into the Table object.

If Use regular expression is selected, then standard Regular Expression syntax can be used to specify the pattern (see below). ParseColumn2.png

Output Results

After this is done, user can see the change in the table like this:


Example Usage

Note: The Use Regular Expression checkbox in Step 3 can be used for more complicated parsing techniques. It allows the user to enter an expression to separate more complicated columns, or with more complicated patterns. For instance, if a column contained data similar to the following T1_DBP_10yrs and the user wanted a numeric time column, a treatment column, and a numeric age column, this would not be possible without this checkbox checked. Without it checked, the user would end up with three columns (T1, DBP, and 10yrs) while they really wanted (1, DBP, 10).

The Regular Expression format is as follows: () represents a word. So, to parse this the same as one would without the Use Regular Expression checkbox, the user would enter ()_()_(). This tells Array Studio to create a column for every word, but to leave out each underscore. However, for this more complicated example, the user also wishes to leave out the T, and the yrs. So, the correct input pattern would be T()_()_()yrs. This is shown below:


When it's done, user can see the change of the table:


Note: Certain characters need to be encompassed with single-quotes when using regular expressions. For instance, if the column was treatment.time and the user wished to separate the two columns using the "." and the Regular Expression format, the user would enter ()'.'().

Related Articles