Table Of ContentStudies in Classification, Data Analysis,
and Knowledge Organization
Krzysztof Jajuga
Jacek Batóg
Marek Walesiak Editors
Classification and
Data Analysis
Theory and Applications
fi
Studies in Classi cation, Data Analysis,
and Knowledge Organization
Managing Editors Editorial Board
WolfgangGaul, Karlsruhe, Germany DanielBaier, Bayreuth, Germany
Maurizio Vichi, Rome,Italy FrankCritchley, MiltonKeynes, UK
ClausWeihs, Dortmund, Germany ReinholdDecker, Bielefeld, Germany
Edwin Diday, Paris,France
Michael Greenacre, Barcelona,Spain
CarloNatale Lauro,Naples, Italy
JacquelineMeulman,Leiden,TheNetherlands
PaolaMonari, Bologna, Italy
ShizuhikoNishisato, Toronto, Canada
Noboru Ohsumi,Tokyo,Japan
Otto Opitz, Augsburg,Germany
GunterRitter,FakultätfürMathematiku.
Informatik,UniversitätPassau,Passau,
Germany
Martin Schader,Mannheim, Germany
More information about this series at http://www.springer.com/series/1564
ó
Krzysztof Jajuga Jacek Bat g Marek Walesiak
(cid:129) (cid:129)
Editors
fi
Classi cation and Data
Analysis
Theory and Applications
123
Editors
Krzysztof Jajuga JacekBatóg
Department ofFinancial Investments Institute of Econometrics
andRisk Management andStatistics
Wroclaw University of Economics University of Szczecin
andBusiness Szczecin,Poland
Wroclaw,Poland
MarekWalesiak
Department ofEconometrics
andComputer Science
Wroclaw University of Economics
andBusiness
Wroclaw,Poland
ISSN 1431-8814 ISSN 2198-3321 (electronic)
Studies in Classification,Data Analysis, andKnowledgeOrganization
ISBN978-3-030-52347-3 ISBN978-3-030-52348-0 (eBook)
https://doi.org/10.1007/978-3-030-52348-0
Mathematics Subject Classification: 62Hxx, 62H25, 62H30, 62H86, 62-07, 62-09, 68Uxx, 68U20,
62Pxx,62P12,62P20,62P25
©SpringerNatureSwitzerlandAG2020
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar
methodologynowknownorhereafterdeveloped.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom
therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
hereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregard
tojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Preface
This volume presents the papers from the 28th Conference of Section of
Classification and Data Analysis of Polish Statistical Society held at University of
Szczecin on September 18–20, 2019. The papers presented referred to a set of
studies addressing a wide range of recent methodological aspects and applications
of classification and data analysis tools in micro and macroeconomic problems. In
the final selection, we accepted 20 of the papers that were presented at the con-
ference. Each of the submissions has been reviewed by two anonymous referees
and the Authors have subsequently revised their original manuscripts and incor-
porated the comments and suggestions of the referees. The selection criteria were
based on the contribution of the papers to the theory and applications of modern
classification and data analysis.
The chapters have been organized along the major fields and themes in classi-
fication and data analysis: Methodology, Application in Finance, Application in
Economics and Application in Social Issues.
The part on Methodology contains five papers. The paper by Batóg and
Wawrzyniakfocusesonmodificationsofselectedformulaswhichallowtoreceivea
transformation of nominant into stimulant that ensures that the order of objects
before and after the transformation is consistent with the real values of the nomi-
nant.Dudekinhispaperpresentstherecommendationsontheprinciplesofcorrect
application of the Silhouette index, indicating that the “mechanical” use of the
index leads to results that do not correspond to the actual structure of the classes.
The paper by Grzenda discusses how discretization of continuous variables can
improvetheclassificationaccuracyofmachinelearningmodels,withanapplication
of supervised discretization of continuous variables based on the entropy criterion
and the Gini criterion in demography. Jefmański in his paper proposes an intu-
itionistic fuzzy synthetic measure for ordinal data based on the Hellwig’s linear
ordering method that allows for a comparative analysis of objects due to the
complex phenomenon described by ordinal measurement scales as well as to take
into account the uncertainty in comparing objects expressed in the form of neutral
points on the ordinal measurement scales. Pełka in his paper conducts research on
the usefulness and prediction power of extracting variables from neural networks
v
vi Preface
(multilayerperceptronforsymbolicdata)asthemethodofvariableselectionforthe
purposes of ensemble learning for symbolic data.
The paper on Application in Finance contains also five papers. The paper by
Doszyń using the so-called Szczecin algorithm of real estate mass appraisal is
aimedtoanalyzeifaneconometricmodelwithrestrictionsmaysupporttheprocess
ofreal estatemass appraisal, providing a more precise determinationoftheimpact
of real property attributes on the prices than an analogous model without restric-
tions. Krężołek in his paper address the issue of estimation of tail index of prob-
ability distribution using Hill estimator and its modification, comparing selected
non-parametric and parametric models. The paper by Mikulec and Misztal using
prediction error curves based on the bootstrap cross-validation estimates of the
prediction error estimation provides the evidence that survival function made in
each of the obtained subsets of objects with the use of Kaplan-Meier method
enables more precise estimate of firm’s duration than the use of Kaplan-Meier
function for the total data. Pawełek and Pociecha in their paper compare the
effectiveness prediction of the logit leaf model as a hybrid classification algorithm
that enhances logistic regression and decision tree with the use of individual
classifiers.ThepaperbyTrzpiotexaminestherelationbetweeneconomic,financial
and demographic variables and longevity in terms of long-term investment port-
foliosthataresensitivetoriskfactorsaccordingtotheAPTportfoliofactormodel,
using the Principal Component Regression.
The part on Application in Economics contains four papers. The paper by
Markowicz and Baran investigates the issue of mirror data concerning intra-
Community supplies of goods, with the use of their original indicators of data
asymmetry and an empirical example based on data from the Eurostat COMEXT
database. Cheba and Bąk in their paper explore the relationships between sus-
tainabledevelopmentandgreeneconomyandassesstheresultsobtainedfortheEU
countriesinfourparticularareasusingataxonomicdevelopmentmeasurebasedon
the Weber median. Misztal and Kupis-Fijałkowska in their paper analyze the ICT
development level in Poland against other European Union countries in the indi-
vidual users and households perspective, using the exploratory data analysis
methodsandtheHellwig’smethodoflinearordering.SaganandGrabowskiintheir
paper identify cause-effect relationships as the impact of unknown disturbing
variables affecting both the mediation and focal dependent variables by applying a
simulations of correlated disturbances effect of dependent variables in the tech-
nology acceptance models on the degree of average causal mediation effect bias.
The part on Application in Social Issues contains six papers. Bieszk-Stolorz in
her paper verifies whether risk of subsequent registrations in the labour office
dependsonthecharacteristicsoftheunemployedpersonsusingPrentice-Williams-
Peterson’sconditionalmodels,whichconsiderthetimeuntiltheeventoccursfrom
the beginning of observation, and the time from the previous event. The paper by
Głowicka-WołoszynandWysockianswersthequestionwhethercorrectionofideal
values occurring in Hellwig’s and TOPSIS methods by the quartile criterion,
contributed to the improvement of consistency between the identified levels of the
Polishcommunesfinancialautonomyandthesyntheticmeasurevaluesassignedto
Preface vii
them. Konarzewska in her paper conducts research on the problem of statistical
independenceofchosenpropertiesofobjectsandespeciallythechoiceofadequate
weightsinmulti-criteriarankings,applyingthevaluesofVarianceInflationFactors,
Principal Component Analysis and Multi-Criteria Principal Components.
Landmesser in her paper presents the comparison of personal income distributions
takingintoaccountthegenderincomegapfor28Europeancountriesandusingthe
Oaxaca-Blinderdecompositionprocedure,thedecompositionproceduretodifferent
quantile points along the whole income distribution, and finally the counterfactual
distribution based on the Recentered Influence Function—Regression approach.
ThepaperbyMajewskaandTrzpiotevaluatesdifferentapproachestoidentification
of the existence of the common mortality trends and derives the mortality
time-varying indicator from the Lee-Carter model to obtain the similarities of dif-
ferent countries via a semi-parametric comparison approach to prove that
multi-population mortality models are superior to individual mortality forecasting
models. Matuszewska-Janica in her paper verifies whether selected attributes of
employees affect the level of their wages, considering the impact of outliers on
changes in relative importance of analysed features.
WewishtothanktheAuthorsformakingtheirstudiesavailableforourvolume.
Their scholarly efforts and research inquiries made this volume possible. We are
alsoindebtedtotheanonymousrefereesforprovidinginsightfulreviewswithmany
useful comments and suggestions.
In spite of our intention to address a wide range of problems pertaining to
classification and data analysis theory there are issues that still need to be resear-
ched. We hope that the studies included in our volume will encourage further
research and analyses in modern data science.
Wroclaw, Poland Krzysztof Jajuga
Szczecin, Poland Jacek Batóg
Wroclaw, Poland Marek Walesiak
January, 2020
Contents
Methods
Comparison of Proposals of Transformation of Nominants
into Stimulants on the Example of Financial Ratios of Companies
Listed on the Warsaw Stock Exchange . . . . . . . . . . . . . . . . . . . . . . . . . 3
Barbara Batóg and Katarzyna Wawrzyniak
Silhouette Index as Clustering Evaluation Tool . . . . . . . . . . . . . . . . . . . 19
Andrzej Dudek
The Role of Discretization of Continuous Variables in Socioeconomic
Classification Models on the Example of Logistic Regression Models
and Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Wioletta Grzenda
Intuitionistic Fuzzy Synthetic Measure for Ordinal Data . . . . . . . . . . . . 53
Bartłomiej Jefmański
Improving Classification Accuracy of Ensemble Learning
for Symbolic Data Trough Neural Networks’ Feature Extraction . . . . . 73
Marcin Pełka
Applications in Finance
Inequality Restricted Least Squares (IRLS) Model of Real
Estate Prices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Mariusz Doszyń
Application of Hill Estimator to Assess Extreme Risks
in the Metals Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Dominik Krężołek
ix
x Contents
Segmentation of Enterprises on the Basis of Their Duration
Using Survival Trees—Results of an Analysis for Legal Persons
and Organizational Entities Without Legal Personality
in the Łódzkie Voivodship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Artur Mikulec and Małgorzata Misztal
Corporate Bankruptcy Prediction with the Use of the Logit
Leaf Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Barbara Pawełek and Józef Pociecha
The Impact of Longevity on a Valuation of Long-Term Investments
Returns: The Case of Selected European Countries. . . . . . . . . . . . . . . . 147
Grażyna Trzpiot
Applications in Economics
SustainableDevelopmentandGreenEconomyintheEuropeanUnion
Countries—Statistical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Katarzyna Cheba and Iwona Bąk
The Review of Indicators of Data Quality in Intra-Community Trade
in Goods. The Choice of an Indicator and Its Effect on the Ranking
of Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Iwona Markowicz and Paweł Baran
Development of ICT in Poland in Comparison with the European
Union Countries—Multivariate Statistical Analysis . . . . . . . . . . . . . . . . 203
Małgorzata Misztal and Aleksandra Kupis-Fijałkowska
Sensitivity Analysis in Causal Mediation Effects for TAM Model . . . . . 221
Adam Sagan and Mariusz Grabowski
Applications in Social Problems
Prentice–Williams–PetersonModelsintheAssessmentoftheInfluence
of the Characteristics of the Unemployed on the Intensity
of Subsequent Registrations in the Labour Office . . . . . . . . . . . . . . . . . 237
Beata Bieszk-Stolorz
Right-SkewedDistributionofFeaturesandtheIdentificationProblem
of the Financial Autonomy of Local Administrative Units . . . . . . . . . . . 251
Romana Głowicka-Wołoszyn and Feliks Wysocki
Multi-criteria Rankings with Interdependent Criteria: Case of EU
Countries on Their Way to Healthy Lives and Well-Being . . . . . . . . . . 265
Iwona Konarzewska
The Comparison of Income Distributions for Women and Men
in the European Union Countries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Joanna Landmesser