# ProportionLogisticRegressionModel.pdf

# Proportion Data Logistic Regression

## Overview

The **Proportion Data Logistic Regression Model** module allows the user to perform logistic regression on their dataset. The main difference between this function and General Linear Model is that this function requires -Omic/microarray data with values ranging from 0 to 1, representing the probability of certain event. The most common input is the 0/1 data input where 0 means 'not observed' and 1 means 'observed'. Setting up the model is a 3-step process, similar to the General Linear Model.

To run this module, type **MicroArray | Inference | Other Tests | Proportion Logistic Regression Model**.

### Input Data Requirements

It works on -Omic data types where the data values range from 0 to 1.

## General Options

### Input/Output

**Project & Data**: The window includes a dropdown box to select the Project and Data object to be filtered.**Variables**: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).**Observations**: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).

**Output name**: The user can choose to name the output data object.

### Options

**Step 1(required): specify model**

This is where the user will specify the terms of the model. Getting the terms correct for each individual experiment is key to a successful logistic Model. Clicking **Specify Model** opens the Specify Linear Model window, which has two sections:

**Columns**: This section contains columns from the Data object's Design Table.**Class**: If the column should be considered a Class term, a checkbox for that column can be selected. By default, Array Studio will guess on what constitutes a Class term. In general, numeric columns will not be considered Class terms by default, while other column, such as "Factors", will be considered Class terms by default. Users should consult with a statistician if not sure as to whether a column should be a class term. In the example shown below, time should be considered a Class term, but because Array Studio made it a numeric column, it is not by default. Changing this in the Design Table will affect the default behavior here.**Term**: the factors in the design table.

**Construct Model**:this section is where the user can add the terms to the model. By selecting terms on the left, the user can use the Add, Cross, Nest, and Remove buttons to select the terms for that particular model.**Add**: Clicking this button will add the selected terms to the model.**Cross**: Clicking this button will cross the terms selected on the left (this is discussed in more detail later).**Nest**: Clicking this button will nest the selected term on the left panel to the selected term on the right panel.**Remove**: Clicking this button will remove selected terms in right panel.

**Step 2(required): specify Test**

The second step is to specify the test (FTest) any of the terms in the model. In this window, the user can add any of the terms form the model. This is shown in more detail below.

**Options**: Users can specify which parameter estimates for the FTest will be output.**Raw p-value**: The raw p value of the F-test.**Adjusted p-value**: The adjusted p-value, based on raw p value.**Generate significant list**: Generate a list that contains the variables that have an adjusted p value less than the threshold (by default it is 0.05).

**Step 3(optional): Change Options**

This step provides the user with some additional options available for change. While this is considered optional, the user should verify these settings before proceeding.

**General**

**ANOVA test type**: Type1, Type2, Type3, and Type4 (Type 3 is the default option) - These are sum of square types that are related to**ANOVA**. They only make a difference if you have an unbalanced design. Type 1, 2 and 3 are universally accepted types, and Type 4 is SAS specific. Type 3 is mostly commonly used and generally correct. For additional information please see the following link: http://en.wikipedia.org/wiki/Explained_sum_of_squares.**Multiplicity**: FDR_BH, FDR_BY, Bonferroni, Sidak, StepDownBonferroni, StepDownSidak, and StepUp (FDR_BH is the default option)**Alpha level**: For the generation of estimate Lists, the user can specify and Alpha level cutoff (p-value cutoff; by default this is 0.05)**Select list folder**: Allows the user to select the folder into which any generated lists will go.**The Generate overall significant list**: This checkbox will create a "master" list encompasses all significant rows from all comparisons/FTests/etc.

- Note: The Multiplicity adjustment takes into account the total number of tests performed within a given analysis. There is the ability to set the default option to adjust p-values on a per-test basis. Please refer to "[ Statistics]" section of the User Guide for details.

## Output Results

An example of Logistic regression output Inference Report is:

## OmicScript

Proportion Logistic Regression Model