# EstimateDensity.pdf

# Estimate Kernel Density

## Overview

Kernel density estimation (or Parzen window method) is a non-parametric way of estimating the probability density function of a random variable. As an illustration, given some data about a sample of a population, kernel density estimation makes it possible to extrapolate the data to the entire population. For more information on kernel density estimation, see http://en.wikipedia.org/wiki/Kernel_density.

One common use of this module is to visualize whether the data approximate a normal distribution, before running statistical modules designed for microarray data.

To run this module, type **MicroArray | Summarize | Kernel Density**.

### Input Data Requirements

This module works on -Omic data types.

## General Options

### Add file

**Project & Data**: The window includes a dropdown box to select the Project and Data object to be filtered.**Variables**: Selections can be made on which variables should be included in the filtering (options include All variables, Selected variables, Visible variables, and Customized variables (select any pre-generated Lists)).**Observations**: Selections can be made on which observations should be included in the filtering (options include All observations, Selected observations, Visible observations, and Customized observations (select any pre-generated Lists).

**Output name**: The user can choose to name the output data object.

### Options

**Kernel type**: The kernel type users would like to fit.- Users can choose among
**Gaussian**,**Epanechnikov**,**Rectangular**,**Triangular**,**Biweight**,**Cosine**, or**OptCosine**, depending on the expected best-fit distribution for the data.

- Users can choose among
**N (estimation precision)**: The number of bins. Intuitively one wants to choose**N**as large as possible, however there is always a trade-off between the bias of the estimator and its variance. The default value is 512.

## Output Results

An example Kernel Density plot is shown below and appears as a **DensityView** in the **Summary** folder of the **Tables** section in the **Project Explorer**.

If users add a **Table** view for Density result, the table's column number will be 2 times the sample size, as each sample will have two columns.

Take chip '01 A' as an example, the first column '01 A X' shows the X axis position and the column '01 A' shows the corresponding density value. If users use the default **N (estimation precision)**, there will be 512 rows, corresponding to 512 bins.

Users can also calculate other parameters of the data distribution in the Microarray Summary Statistics module, such as skewness and kurtosis.