Types of variable (column) and object (row) normalization formulas
data.Normalization (x,type="n0",normalization="column",...)
Normalized data The numeric shifts and scalings used (if any) are returned as attributes "normalized:shift" and "normalized:scale"
vector, matrix or dataset
type of normalization:
n0 - without normalization
n1 - standardization ((x-mean)/sd)
n2 - positional standardization ((x-median)/mad)
n3 - unitization ((x-mean)/range)
n3a - positional unitization ((x-median)/range)
n4 - unitization with zero minimum ((x-min)/range)
n5 - normalization in range <-1,1> ((x-mean)/max(abs(x-mean)))
n5a - positional normalization in range <-1,1> ((x-median)/max(abs(x-median)))
n6 - quotient transformation (x/sd)
n6a - positional quotient transformation (x/mad)
n7 - quotient transformation (x/range)
n8 - quotient transformation (x/max)
n9 - quotient transformation (x/mean)
n9a - positional quotient transformation (x/median)
n10 - quotient transformation (x/sum)
n11 - quotient transformation (x/sqrt(SSQ))
n12 - normalization ((x-mean)/sqrt(sum((x-mean)^2)))
n12a - positional normalization ((x-median)/sqrt(sum((x-median)^2)))
n13 - normalization with zero being the central point ((x-midrange)/(range/2))
"column" - normalization by variable, "row" - normalization by object
arguments passed to sum
, mean
, min
sd
, mad
and other aggregation functions. In particular:
na.rm
- a logical value indicating whether NA values should be stripped before the computation
Marek Walesiak marek.walesiak@ue.wroc.pl, Andrzej Dudek andrzej.dudek@ue.wroc.pl
Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland
See file ../doc/dataNormalization_details.pdf for further details
Thanks Wolfgang Lederer (<wolfgang.lederer@gmail.com>) for reporting n4/vector error
Anderberg, M.R. (1973), Cluster analysis for applications, Academic Press, New York, San Francisco, London. ISBN 9780120576500.
Gatnar, E., Walesiak, M. (Eds.) (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych [Multivariate statistical analysis methods in marketing research], Wydawnictwo AE, Wroclaw, 35-38.
Jajuga, K., Walesiak, M. (2000), Standardisation of data set under different measurement scales, In: R. Decker, W. Gaul (Eds.), Classification and information processing at the turn of the millennium, Springer-Verlag, Berlin, Heidelberg, 105-112. Available at: tools:::Rd_expr_doi("10.1007/978-3-642-57280-7_11").
Milligan, G.W., Cooper, M.C. (1988), A study of standardization of variables in cluster analysis, "Journal of Classification", vol. 5, 181-204. Available at: tools:::Rd_expr_doi("10.1007/BF01897163").
Mlodak, A. (2006), Analiza taksonomiczna w statystyce regionalnej, Difin, Warszawa. ISBN 83-7251-605-7.
Walesiak, M. (2014), Przeglad formul normalizacji wartosci zmiennych oraz ich wlasnosci w statystycznej analizie wielowymiarowej [Data normalization in multivariate data analysis. An overview and properties], "Przeglad Statystyczny" ("Statistical Review"), vol. 61, no. 4, 363-372. Available at: tools:::Rd_expr_doi("10.5604/01.3001.0016.1740").
cluster.Sim
library(clusterSim)
data(data_ratio)
z1 <- data.Normalization(data_ratio,type="n1",normalization="column",na.rm=FALSE)
z2 <- data.Normalization(data_ratio,type="n10",normalization="row",na.rm=FALSE)
Run the code above in your browser using DataLab