Table Of Content

Pooling Facial Segments to Face: The Shallow and Deep Ends Upal Mahbub∗ Sayantan Sarkar∗ Rama Chellappa Department of Electrical and Computer Engineering and the Center for Automation Research, UMIACS, University of Maryland, College Park, MD 20742 {umahbub, ssarkar2, rama}@umiacs.umd.edu Abstract—Genericfacedetectionalgorithmsdonotperform very well in the mobile domain due to significant presence of occludedandpartiallyvisiblefaces.Onepromisingtechniqueto handle the challenge of partial faces is to design face detectors basedonfacialsegments.Inthispapertwosuchfacedetectors namely,SegFaceandDeepSegFace,areproposedthatdetectthe presence ofa face givenarbitrary combinations ofcertain face segments. Both methods use proposals from facial segments as 7 input that are found using weak boosted classifiers. SegFace 1 is a shallow and fast algorithm using traditional features, 0 tailored for situations where real time constraints must be 2 satisfied. On the other hand, DeepSegFace is a more powerful n algorithm based on a deep convolutional neutral network a (DCNN) architecture. DeepSegFace offers certain advantages J over other DCNN-based face detectors as it requires relatively 9 little amount of data to train by utilizing a novel data aug- 2 mentation scheme and is very robust to occlusion by design. Extensive experiments show the superiority of the proposed ] methods,speciallyDeepSegFace,overotherstate-of-the-artface V detectors in terms of precision-recall and ROC curve on two C mobile face datasets. . s I. INTRODUCTION c [ In recent years, there has been substantial progress in the development of efficient and robust face detection techniques Fig. 1. Sample face detection outcome of the proposed DeepSegFace 1 methodontheUMDAA-02-FDdataset.Thefirstthreerowsshowcorrect v mostly because of the remarkable progress in convolutional detections at different illumination, pose and partial visibility scenarios, 1 neuralnetwork(CNN)architecturesforfacedetectionandthe whilethelastrowshowssomeincorrectdetections. 4 availability of large amount of face data [21][29][5]. Though 3 thegeneraltrendofdevelopingnewfacedetectorsiscentered neuralnetworks(DCNN)suchasHyperface[21],yahoo 8 around detecting faces in unconstrained environments with multiview [5] and CUHK [29] are usually trained on 0 . large variations in pose and illumination [19][9][8][10], there full faces. While they work well on detecting multiple 1 is also a growing interest for developing face detectors frontal or profile faces of various resolution, they 0 7 optimized for detecting occluded and partially visible faces frequently fail to detect partial faces. 1 from images captured with mobile devices [13][25]. Reliable 2) For active authentication the recall rate needs to be : and fast detection of faces from the front-camera captures of high at very high precision. Many of the available v i a mobile device is a fundamental step for applications such face detectors have a low recall rate even though X as active/continuous authentication of the user of a mobile the precision is high. When operated at high recall, r device [18][14][33][17]. their precision drops rapidly because of excessive false a Although, face-based authentication systems on mobile positive detection. devices rely heavily on accurate detection of faces prior to 3) Thealgorithmneedstobesimple,fastandcustomizable verification, most state-of-the-art techniques are ineffective in order to operate in real-time on a cellular device. for mobile devices because of the following reasons: While in [25] and [24] the authors deploy CNNs on mobile GPUs for face detection and verification, most 1) Face images captured by the front camera of the CNN-based detectors, such as [21] and some generic phone are, in many cases, only partially visible [14]. methodslike[19]aretoocomplextorunonthemobile Traditional face detectors such as Viola- Jones’s [30] platform. and Deformable part model (DPM) [19] and more advanced face detectors based on deep convolutional Ontheotherhand,imagescapturedforactiveauthentication offer certain advantages for the face detection problem *Firsttwoauthorscontributedequally because of its semi-constrained nature [13]. Usually there is a single user in the frame, hence there is no need to handle weredesigned,researchersproposeddifferentcombinationsof multiple face detection. The face is in close proximity of the featureswithSVMforrobustfacedetection[31].Recently,in camera, within a certain range of dimensions and of high [16]theauthorsimprovedtheperformanceoftheDPM-based resolution, thus eliminating the need for detecting at multiple method and also introduced Headhunter, a new face detector scales or resolution. that uses Integral Channel Features (ICF) with boosting Partial faces such as ones present in images captured by to achieve state-of-the-art performance in face detection in the front camera of mobile devices can be handled if the the wild. A fast face detector that uses the scale invariant algorithm is able to effectively combine detections of facial and bounded Normalized Pixel Difference (NPD) features is parts into a full face detection. Therefore to address this proposed in [12] that uses a single soft-cascade classifier to requirement, two algorithms SegFace and DeepSegFace are handle unconstrained face detection. The method is claimed proposed in this paper, that detect faces from proposals made to achieve state-of-the-arts performance on FDDB, GENKI, of face segments. Sample results for DeepSegFace are shown and CMU-MIT datasets. in Fig. 1. The proposal are generated using weak adaboost The performance break-through observed after the intro- cascade classifiers following the method described in [13]. duction of Deep Convolutional Neural Networks (DCNN) This paper makes the following contributions: can be attributed to the availability of large labeled datasets, 1) SegFace, a fast real-time face segment to face detector availability of GPUs, the hierarchical nature of the deep is proposed. networks and regularization techniques such as dropout space 2) DeepSegFace, a novel deep CNN-based architecture, [31]. that detects faces from facial segments-based proposals Continuous authentication of mobile devices requires par- is developed. tiallyvisiblefacedetectionandverificationtooperatereliably 3) A principled scheme is developed that helps augment [14]. In [13], the authors introduced a face detection method data as well as regularize the classifier. based on facial segments to detect partial faces on images In section II a summary of works done on face detection captured for active authentication with smartphones. The in general and in the mobile domain is given. In section III, idea was to produce face proposals by employing a number the proposed face detection techniques are described in detail. of weak Adaboost facial segment detectors on each image A brief description of the two mobile-face datasets that are and clustering them. The authors proposed to form at most ζ used for experimental validation are described in section IV . subsetsoffacialsegmentsfromeachcluster.Uniqueproposals Alltheanalysis,experimentalresultsandcomparisonsforthe wereobtainedbyfilteringouttheredundantclustersthathave proposed methods with state-of-the-art methods are provided exactly the same facial segments with the same bounding in section V. Finally, a brief summary of this work as well boxes. Statistical features from the unique proposals were as future directions of research are included in section VI. then used to train a support vector machine classifier for face detection. The method worked well on AA-01-FD[33] and II. RELATEDWORKS UMDAA-02-FD[14]mobilefacedetectiondatasetscompared Face detection is one of the earliest applications of to other non-CNN methods [14]. [25] and [27] are two other computervisiondatingbackseveraldecades[3][23].However, methods that explicitly address the partial face detection mostmethodsbefore2004performedpoorlyinunconstrained problem. Specially in [27] the authors achieve state-of-the- conditions, and therefore were not applicable in real-world arts performance on the FDDB, PASCAL and AFW datasets settings [31]. Viola and Jones’s seminal work on boosted by generating face parts responses from an attribute-aware cascaded classification-based face detection [30] was the deep network and refining the face hypothesis for partial first algorithm that made face detection feasible in real- faces. Among other recent works, HyperFace [21] is a deep world applications and is still used widely in digital cameras, multi-task learning framework for face detection, landmark smartphones and photo organization software. The method, localization, pose Estimation, and gender recognition. The however, works reasonably well only for near-frontal faces method exploits the synergy among related tasks by fusing under normal illumination without occlusion [32]. Extensions the intermediate layers of a deep CNN using a separate CNN of the boosted architecture for multi-view face detection are and thereby boosting their individual performances. found in literature, such as in [7][11], but these detectors are difficult to train, and do not perform well because of inaccuracies introduced by viewpoint estimation and III. PROPOSEDMETHOD quantization [32]. A more robust face detector is introduced in [19] that uses facial components or parts to construct a There are multiple paradigms of face detection for the deformable part model (DPM) of a face. Similar geometrical mobile platform, one of which is to detect a face from facial modelingapproachesarefoundin[1][15].In[26],theauthors segments. This is because of significant presence of partial introduced an examplar-based face detection method that faces in this domain as discussed earlier. Therefore, SegFace does not require multi-scale shifting windows. As support a simpler traditional feature-based scheme and DeepSegFace, vector machines (SVMs) became effective for classification a DCNN-based architecture, both of which detect faces from and robust image features like SURF, local binary pattern proposals composed of face segments, are proposed in this (LFP)histogramoforientedgradient(HoG)andtheirvariants paper. Fig. 2. block diagram showing the general architecture of a face segment to face detector, with components such as facial segments-based proposal generator,featureextractor,classifierandre-rankingbasedonpriorprobabilitiesofsegments A. Proposal generation is shown schematically in Fig. 2. The figure depicts the integration of the proposal generation block into the pipeline. The set of facial segments is denoted by S = {a | k = k In the following sections, two instances of the detection 1,2,...M}, where M is the number of segments under model training block shown in the figure namely, SegFace consideration and a is a particular facial segment. M weak k and DeepSegFace, are discussed. Adaboost facial segment detectors are trained to detect each of the segments in S. After running all the segment detectors B. SegFace on an image, the detected face segments which produce nearby estimated face center are grouped into clusters CLj, SegFace is a fast and shallow face detector built from j ={1,2,...cI}asdiscussedin[13].Here,cI isthenumber segments proposal. For each segment in sk ∈S, a classifier ofclustersformedforimageI.Aboundingboxforthewhole C is trained to accept features f(s ) from the segment and face BCLj is calculated based on the constituent segments. gesnkerate a score denoting if a face iks present. Output scores For the generation of a proposal set, duplicate clusters that of C are stored in an M dimensional feature vector F , yieldexactlysameboundingboxesareeliminatedandatmost whersek, elements in F corresponding to segments that aCre C ζ face proposals are generated from each cluster by selecting not present in a proposal are set to 0. random subsets of face segments constituting that cluster. Another feature vector of size 2M+2 is constructed using Therefore, each proposal P is composed of a set of face several prior probability values as features from the training segments SP, where SP ∈P(S)−{∅} and P denotes the proposal set that represents the likeliness of certain segments power set. To get better proposals, one can impose extra and certain combinations. These values are requirements such as |S |>c, where |·| denotes cardinality and c is a threshold. EaPch proposal is also associated with • Fraction of total true faces constituted by proposal P, i.e. a bounding box for the whole face, which is the smallest |P ∈ΘF| bounding box that encapsulates all the segments in that , |ΘF| proposal. In our proposal generation scheme, M = 9 is used. where ΘF is the set of all proposals that return a true The nine parts under consideration are nose (Nose), eye- face. pair (Eye), upper-left three-fourth (UL ), upper-right three- 34 • The fraction of total mistakes constituted by proposal fourth (UR ), upper-half (U ), left three-fourth (L ), 34 12 34 P, i.e. upper-left-half (UL ), right-half (R ) and left-half (L . 12 12 12 These nine parts, constituting the best combination C |P ∈ΘF| best , [13] according to the analysis of effectiveness of each part |ΘF| in detecting faces, are considered in this experiment since the same adaboost classifiers are adapted in this work for , where, ΘF is the set of all proposals that are not faces. proposal generation. The threshold c is set to 2. A small • For each of the M facial segment sk ∈S, the fraction valueischosentogethighrecall,atthecostoflowprecision. of total true face proposals of which s is a part of, i.e. k This lets one generate a large number of proposals, so that |s ∈S ;S ∈ΘF| any face is not missed in this stage. ζ is set to 10. k p p , where, k =1,2,...,M. The general facial segment to face detector pipeline |ΘF| TABLEI • For each of the M facial segment sk ∈S, the fraction STRUCTUREOFDEEPSEGFACE’SCONVOLUTIONALLAYERS(FEATURE of total false face proposals of which s is a part of, i.e. k EXTRACTIONANDDIMENSIONALITYREDUCTION) |s ∈S ;S ∈ΘF| k p p , where, k =1,2,...,M. Segment Input Feature Dim.Reduce Flatten |ΘF| Nose 3×69×81 512×2×2 50×2×2 200 F and F are appended together to form the full feature Eye 3×54×162 512×1×5 50×1×5 250 C S vector F of length 3M +2. Then a master classifier C is UL34 3×147×147 512×4×4 50×4×4 800 UR34 3×147×147 512×4×4 50×4×4 800 trained on the training set of such labeled vectors {Fi,Yi}, U12 3×99×192 512×3×6 50×3×6 900 where Yi denotes the label (face or no-face). Thus, C learns L34 3×192×147 512×6×4 50×6×4 1200 how to assign relative importance to different segments and UL12 3×99×99 512×3×3 50×3×3 450 likeliness of certain combinations of segments occurring in R12 3×192×99 512×6×3 50×6×3 900 L12 3×192×99 512×6×3 50×6×3 900 deciding if a face is present in a proposal. Thus, SegFace TABLEII extends the face detection from segments concept in [13] usingtraditionalmethodstoobtainreasonablyaccurateresults. COMPARISIONOFTHEPROPOSEDMETHODS In our implementation of SegFace, HoG [4] features are used Component SegFace DeepSegFace as f and Support Vector Machine (SVM) classifiers [2] are Proposal Clustering detections Clustering detections used as both C and C for generating segment-wise scores, Generation fromcascadeclassifiers from cascade classifiers sk as well as the final detection score, respectively. forfacialsegments forfacialsegments Low level HoGfeatures DeepCNNfeatures C. DeepSegFace: CNN Architecture for detecting faces from features Intermediate SVMforsegmentiout- Dimensionreductionand face segments stage puts a score on HoG concatenation to single DeepSegFace is an architecture to integrate deep CNNs featuresofsegmenti 6400Dvector Final classi- SVMtrainedonscores Fullyconnectedlayer,fol- and segments-based face detection. Thus, it allows for end- fier frompartSVMsandpri- lowedbyasoftmaxlayer to-end training and for exploiting the superior capabilities of ors deep CNNs for segment-based detection. At first, proposals, Usingpriors Usedasfeaturesinthe Used for re-ranking of finalSVM faceprobabilitiesinpost consisting of subsets of the M =9 parts as discussed earlier, processing are generated for each image. DeepSegFace is then trained Tradeoffs Fastbutlessaccurate Slowerbutmoreaccurate to calculate the probability values of the proposal being a face. Finally, a re-ranking step adjusts the probability values softmax layer of two nodes (both randomly initialized) is from the network. The proposal with the maximum re-ranked added on top of the feature vector. The two outputs of the score is deemed as the detection for that image. softmax layer corresponds to the probability of being a face The architecture of DeepSegFace is arranged according to or not being a face and hence they sum to one. theclassicparadigminpatternrecognition:featureextraction, Re-ranking: The DeepSegFace network outputs the face dimensionality reduction followed by a classifier. Training detection probabilities for each proposal in an image, which occurs end-to-end (proposal to face probability). A simple canbeusedtoranktheproposalsandthendeclarethehighest blockdiagramofthearchitectureisshowninFig.3.Different probability proposal as the face in that image. However components of the figure are discussed here. there is some prior knowledge that some segments are more Convolutional Feature Extraction: There are nine convo- effective at detecting the presence of faces than others. This lutional networks for each of the nine segments. Each of information is available from the prior probability values the nine columns is structurally similar to and initialized discussed in section III-B. In the case of SegFace, the with the convolution layers of VGG16 network [28]. Thus statistical features are incorporated in the feature vector of each network has thirteen convolution layers arranged in five the master SVM. Since these feature are similar to fixed blocks. Each segment in the proposal is resized to standard priors for the segments and their differnt combinations, for dimensions for that segment, then the VGG mean value is DeepSegFace, these values are used to re-rank the final score subtractedfromeachchannel.Forsegmentsnotpresentinthe by multiplying it with the mean of the statistical features. proposal, zero-input is fed into the networks corresponding More details on the dimensions of DeepSegFace architec- to those segments, as shown for the Nose segment in the ture at different stages are presented in Table I. Key insights figure. about the structure of the network are provided in the next Dimensionalityreduction:Sincethelastconvolutionallayer subsection. ofeachnetworkoutputs512featuremaps,intotalthenumber of features is quite large. Therefore, a randomly initialized D. Interpretations convolutional layer with filter size 1×1 and fifty feature maps is added to learn an appropriate dimension reduction. 1) SegFace vs. DeepSegFace: One can think of SegFace Classifier: The outputs from the dimensionality reduction andDeepSegFacebelongingtothesameschooloffacedetec- block for each segment-network are flattened and concate- tionalgorithms,namely,detectingfacesbypoolingdetections nated side by side to yield a 6400 dimensional feature vector. of facial segments. Both have broad structural similarities as A fully connected (fc) layer of 250 nodes, followed by a discussed in table II. However SegFace focuses on resource Fig.3. BlockdiagramshowingDeepSegFacearchitecture. constrained execution (no GPU) while DeepSegFace targets IV. DATASET performance. Given the sensitive nature of smartphone usage data, there Currentlybothoftheirinputproposalscomefromthesame has been a scarcity of large dataset of front camera images proposalgenerationschemedescribedinsectionIII-A.Hence, in natural settings. However, the following two datasets have bothstrategieswillbenefitfromimprovedproposalgeneration been published in recent years that provided a platform to algorithms. Here a very fast proposal generation scheme is evaluate partial face detection methods in real-life scenarios. chosen over more sophisticated ones mostly for processing speed. However if processing power is not a bottleneck, one A. Active Authentication Dataset-01 (AA-01) can customize more advanced proposal generation schemes The AA-01 dataset [33] is a challenging dataset for front- such as Faster R-CNN [22] for this purpose. camera face detection task which contains the front-facing 2) Facial Segment Drop-out for Better Generalization: As camera face video for forty three male and seven female mentioned in the proposal generation scheme, subsets of face IPhoneusersunderthreedifferentambientlightingconditions: segments in a cluster are used to generate new proposals. For well-lit, dimly-lit, and natural daylight. In each session, the example, if a cluster of face segments contains n segments users performed five different tasks: enrollment, scrolling, and each proposal must contain atleast c segments, then it is picture count, document reading and picture dragging. To n possible to generate (cid:80) (cid:0)n(cid:1) proposals. Now, if all the facial evaluatethefacedetector,faceboundingboxeswereannotated k k=t in a total of 8036 frames of the fifty users. This dataset, segments are present, the network’s task is easier. However, denoted as AA-01-FD, contains 1607 frames without faces all the nine parts are redundant for detecting a face, since and6429frameswithfaces[13],[25].Theimagesinthisare there are significant overlaps among the parts. Also, in a semi-constrained as the subjects perform a set task during given face, often many segments are not detected by the the data collection period. However they are not required or weak segment detectors. Thus, one can interpret the missing encouraged to maintain a certain posture, hence the dataset segments as ‘dropped-out’, i.e. some of the input signals are is sufficiently challenging due to pose variations, occlusions randomly missing (they are set to zero). Thus the network and partial faces. mustberobusttofacesegments‘droppingout’andgeneralize better to be able to identify faces. B. University of Maryland Active Authentication Dataset-02 (UMDAA-02) E. Data augmentation The UMDAA-02 contains usage data of more than fifteen Training with subsets of detected proposals also has the smartphone sensors obtained in a natural settings for an additionaleffectofaugmentingthedata.Ithasbeenobserved average of ten days per user [14]. The UMDAA-02 Face that around sixteen proposals are generated per image. Many Detection Dataset (UMDAA-02-FD) dataset contains a total of these proposals are actually training the network to detect of33,209images,manuallyannotatedforfaceboundingbox, the same face using different combination of segments. This from all sessions of the 44 users (33 male, 10 female) of is a more principled data augmentation technique compared UMDAA-02sampledatanintervalof7seconds.Thisdataset toothercommonlyusedmethodslikeaddingadditionalnoise, is truly unconstrained as data is collected during real-time in which case it is not explicitly clear how the augmentation phoneusageoveraperiodofoneweek.Thefaceimageshave is helping the network learn better. wide variation of pose and illumination, and it is observed TABLEIII COMPARISONAT50%OVERLAPONAA-01-FDANDUMDAA-02-FD DATASETS AA-01 UMDAA-02 Methods TARat Recallat TARat Recallat 1%FAR 99%Prec. 1%FAR 99%Prec. NPD[12] 29.51 11.0 33.49 26.79 DPMBaseline[16] 85.08 83.25 78.48 72.79 Fig.4. Imageswithoutevenonegoodproposalreturnedbytheproposal DeepPyramid[20] 66.17 42.35 71.19 66.07 generation mechanism. This bottleneck can be removed by using better HyperFace[21] 90.52 90.32 73.01 71.14 proposalgenerationschemes. FSFDCbest [13] 59.06 55.65 55.74 26.88 SegFace 67.12 63.09 66.44 61.47 DeepSegFace 87.16 86.49 82.26 76.28 thus its performance can increase if it uses a more powerful proposal generation scheme, such as R-CNN[6]. that the faces are mostly large in size and a good proportion In Fig. 4, some images are shown for which the proposal of the face images are only partially visible. generator did not return any proposals or returned proposals without sufficient overlap, even though there are somewhat V. EXPERIMENTALRESULTS good,visiblefacesorfacialsegmentsinthem.Thepercentage A. Experimental Setup of true faces that are represented by at least one proposal The experimental results, presented in this section, demon- in the list of proposals for the training and test sets are strate the effectiveness of the proposed methods over other counted. The result of this analysis is shown in Fig. 5. The state-of-the-art face detection algorithms. In particular, ex- bar graphs denote the percentage of positive samples and perimental results on the AA-01-FD and UMDAA-02-FD negative samples present in the proposal list generated for a datasets are compared with a) Normalized Pixel Difference certainoverlapratio.Forexample,outofthe55,756proposals (NPD)-based detector [12], b) Hyperface detector [21], c) generated for training, there are approximately 62% positive DeepPyramidDeformablePartModeldetector[20],d)DPM samplesand35%negativesamplesatanoverlapratioof50%. baseline detector [16], and e) Facial Segment-based Face Consideringtheoverlapratiofixedto50%forthisexperiment, Detector (FSFD) [13]. Both SegFace and DeepSegFace are it can be seen from the line plot in Fig. 5(b), corresponding trained on 3964 images from AA-01-FD and trained models to the AA-01-FD test set, that the proposal generator actually are validated using 1238 images. The data augmentation represent 89.18% of the true faces successfully and fails to process produces 57,756 proposals from the training set, that generate a single good proposal for the rest of the images. is around 14.5 proposals per image. The remaining 2835 Hence, the performance of the proposed detectors are upper- images of AA-01-FD are used for testing. For UMDAA-02- bounded by this number on this dataset, a constraint that FD, 32,642 images are used for testing. In all experiments can be mitigated by using advanced proposal schemes like with SegFace and DeepSegFace, c = 2 and ζ = 10 is R-CNN which generates around 2000 proposals per image considered. for Hyperface, compared to just around sixteen proposals The results are evaluated by comparing the ROC curve that are generated by the fast proposal generator employed and precision-recall curves of these detectors since all of by DeepSegFace. them return a confidence score for detection. The goal is However, when considering the UMDAA-02-FD test set, to achieve high True Acceptance Rate (TAR) at a very low which is completely unconstrained and has almost ten times False Acceptance Rate (FAR) and also a high recall at a very more imagesthan AA-01-FD test set, this upper boundmight high precision. Hence, numerically, the value of TAR at 1% not be so bad. From Fig. 5(c) it can be seen that the upper FPR and recall achieved by a detector at 99% precision are bound for UMDAA-02-FD is 87.57% true positive value. the two metrics that are used to compare different methods. Now, in Fig. 6, the ROC for this dataset is shown. It can In table III, the performance of SegFace and DeepSegFace be seen that the DeepSegFace method outperforms all the are compared with state-of-the-arts methods for both datasets. other methods, including HyperFace, with a large margin From the measures on the AA-01-FD dataset, it can be seen even with the upper bound (the curve flattens around 87.5%). that SegFace, in spite of being a traditional feature based This is because all the traditional methods suffer so much algorithm, outperforms several algorithms like FSFD and more when detecting mobile faces in truly unconstrained evenDCNNbasedalgorithmssuchasNPDandDeepPyramid. settings that a true acceptance rate of even 87% is hard to On the other hand, DeepSegFace outperforms all the other achieve. It is to be noted that the data collection process for methods except HyperFace in terms of the two evaluation AA-01-FD was task-based [33] and hence, supervised, while measures on the AA-01-FD dataset. Hyperface is a state- UMDAA-02-FDwascollectedinacompletelynaturalsetting of-the-art algorithm that is trained on over 20,000 images, [14]. The precision-recall curve for UMDAA-02-FD dataset compared to only 5202 images used to train DeepSegFace. is shown in Fig. 7. It can be seen that the DeepSegFace Also Hyperface uses R-CNN to generate face proposals, method has much better recall at 99% precision than any compared to the fast weak classifiers used by DeepSegFace. other method. In both figures, the performance of SegFace is Furthur analysis reveals that one of the bottlenecks of foundsatisfactorygivenitsdependencyontraditionalfeatures. DeepSegFace’s performance is the proposal generation phase, In fact, the curves are not too far off from the DeepPyramid Fig.5. (a)for57756TrainProposalsfromAA-01-FDDataset,(b)39168TestProposalsfromAA-01-FDDataset,and(c)410138TestProposalsfrom UMDAA-02-FDdataset.Inallcasesc=2andζ=10. VI. CONCLUSIONANDFUTUREDIRECTIONS This paper proposes two schemes, DeepFaceSeg and FaceSeg, for detecting faces captured by front cameras of smartphones, primarily for the purpose of active authentication. By detecting faces from facial segments, the algorithms are well equipped to handle partial faces which are prevalent in the mobile face domain. Also a principled dataaugmentationforthisclassofalgorithmsisproposedthat makes the network generalize well. DeepFaceSeg, a DCNN architecture, performs very well on two mobile face datasets, Fig.6. ROCcurveforcomparisonofdifferentfacedetectionmethodson whileFaceSeg,whichisdesignedforspeed,worksreasonably theUMDAA-02dataset well, but is simple and fast enough to be implemented on mobile platforms for real time operations. DeepSegFaceandFaceSegarebuttwoinstantiationsofthe general concept of detecting faces from proposals with facial segment.Thegeneralideacanbeextendedbyusingimproved segment proposals or better classifiers. These detected faces comes with several facial segments, which opens up the research opportunity to investigate facial segment-based fiducial estimation, attribute detection and face verification. Future research direction may also include customizing and optimizing both algorithms to implement them in mobile- platforms for real-life applications. Fig.7. Precision-Recallcurveforcomparisonofdifferentfacedetection ACKNOWLEDGEMENT methodsontheUMDAA-02dataset. This work was done in partnership with and supported by Google Advanced Technology and Projects (ATAP), a Skunk Works-inspired team chartered to deliver breakthrough innovations with end-to-end product development based on method, which is DCNN-based. cutting edge research and a cooperative agreement FA8750- The proposals for both test sets are analyzed to reveal that 13-2-0279 from DARPA. on an average only three segments per proposal are present for both datasets. Thus, while there are nine convolutional REFERENCES networks in the architecture, only three of them need to fire [1] S.BileschiandB.Heisele.Advancesincomponentbasedfacedetection. onanaverageforgeneratingscoresfromtheproposals.When InAnalysisandModelingofFacesandGestures,2003.AMFG2003. IEEEInternationalWorkshopon,pages149–156,Oct2003. forwarding proposals in batch sizes of 256, DeepSegFace [2] C. J. Burges. A tutorial on support vector machines for pattern takes around 0.02 seconds per proposal on a GTX Titan-X recognition. volume2,pages121–167,January1998. [3] H.ChanandW.Bledsoe. Aman-machinefacialrecognitionsystem: GPU. SegFace takes around 0.49 seconds when running on Somepreliminaryresults. 1965. a Intel Xeon CPU E-2623 v4 (2.604 GHz) machine with [4] N.DalalandB.Triggs. Histogramsoforientedgradientsforhuman 32GB Memory without multi-threading, hence it is possible detection. In Proceedings of the 2005 IEEE Computer Society ConferenceonComputerVisionandPatternRecognition(CVPR’05)- to optimize it to run on mobile devices in reasonable time Volume1-Volume01,CVPR’05,pages886–893,Washington,DC, without specialized hardware. USA,2005.IEEEComputerSociety. [5] S.S.Farfade,M.J.Saberian,andL.-J.Li. Multi-viewfacedetection responsestofacedetection:Adeeplearningapproach. InProceedings usingdeepconvolutionalneuralnetworks. InProceedingsofthe5th ofInternationalConferenceonComputerVision(ICCV),2015. ACMonInternationalConferenceonMultimediaRetrieval,ICMR’15, [28] K.SimonyanandA.Zisserman. Verydeepconvolutionalnetworksfor pages643–650,NewYork,NY,USA,2015.ACM. large-scaleimagerecognition. CoRR,abs/1409.1556,2014. [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature [29] Y.Sun,X.Wang,andX.Tang. Deepconvolutionalnetworkcascade hierarchiesforaccurateobjectdetectionandsemanticsegmentation. forfacialpointdetection. InProceedingsofthe2013IEEEConference In Proceedings of the 2014 IEEE Conference on Computer Vision onComputerVisionandPatternRecognition,CVPR’13,pages3476– andPatternRecognition,CVPR’14,pages580–587,Washington,DC, 3483,Washington,DC,USA,2013.IEEEComputerSociety. USA,2014.IEEEComputerSociety. [30] P.ViolaandM.Jones. Rapidobjectdetectionusingaboostedcascade [7] C.Huang,H.Ai,Y.Li,andS.Lao.High-performancerotationinvariant of simple features. In Computer Vision and Pattern Recognition, multiviewfacedetection. PatternAnalysisandMachineIntelligence, 2001.CVPR2001.Proceedingsofthe2001IEEEComputerSociety IEEETransactionson,29(4):671–686,April2007. Conferenceon,volume1,pagesI–511–I–518vol.1,2001. [8] G.B.Huang,M.Ramesh,T.Berg,andE.Learned-Miller.Labeledfaces [31] S.Zafeiriou,C.Zhang,andZ.Zhang. Asurveyonfacedetectionin inthewild:Adatabaseforstudyingfacerecognitioninunconstrained thewild. Comput.Vis.ImageUnderst.,138(C):1–24,Sept.2015. environments. TechnicalReport07-49,UniversityofMassachusetts, [32] C.ZhangandZ.Zhang. Asurveyofrecentadvancesinfacedetection. Amherst,October2007. TechnicalReportMSR-TR-2010-66,MicrosoftResearch,June2010. [9] V.JainandE.Learned-Miller.Fddb:Abenchmarkforfacedetectionin [33] H. Zhang, V. Patel, M. Fathy, and R. Chellappa. Touch gesture- unconstrainedsettings. TechnicalReportUM-CS-2010-009,University basedactiveuserauthenticationusingdictionaries. InApplicationsof ofMassachusetts,Amherst,2010. ComputerVision(WACV),2015IEEEWinterConferenceon,pages [10] M. Kostinger, P. Wohlhart, P. Roth, and H. Bischof. Annotated 207–214,Jan2015. facial landmarks in the wild: A large-scale, real-world database for faciallandmarklocalization. InComputerVisionWorkshops(ICCV Workshops),2011IEEEInternationalConferenceon,pages2144–2151, Nov2011. [11] S.Li,L.Zhu,Z.Zhang,A.Blake,H.Zhang,andH.Shum. Statistical learning of multi-view face detection. In A. Heyden, G. Sparr, M.Nielsen,andP.Johansen,editors,ComputerVision ECCV2002, volume 2353 of Lecture Notes in Computer Science, pages 67–81. SpringerBerlinHeidelberg,2002. [12] S.Liao,A.K.Jain,andS.Z.Li. Afastandaccurateunconstrained facedetector. IEEETrans.PatternAnal.Mach.Intell.,38(2):211–223, Feb.2016. [13] U.Mahbub,V.M.Patel,D.Chandra,B.Barbello,andR.Chellappa. Partial face detection for continuous authentication. In 2016 IEEE International Conference on Image Processing (ICIP), pages 2991– 2995,Sept2016. [14] U. Mahbub, S. Sarkar, V. M. Patel, and R. Chellappa. Active user authenticationforsmartphones:Achallengedatasetandbenchmark results. InBiometricsTheory,ApplicationsandSystems(BTAS),2016 IEEE7thInt.Conf.,Sep.2016. [15] M.Marin-Jimenez,A.Zisserman,M.Eichner,andV.Ferrari.Detecting people looking at each other in videos. International Journal of ComputerVision,106(3):282–296,2014. [16] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool. Face detectionwithoutbellsandwhistles. InECCV,2014. [17] C.McCool,S.Marcel,A.Hadid,M.Pietikainen,P.Matejka,J.Cer- nocky,N.Poh,J.Kittler,A.Larcher,C.Levy,D.Matrouf,J.-F.Bonastre, P.Tresadern,andT.Cootes. Bi-modalpersonrecognitiononamobile phone: using mobile phone data. In IEEE ICME Workshop on Hot TopicsinMobileMultimedia,July2012. [18] V.M.Patel,R.Chellappa,D.Chandra,andB.Barbello. Continuous userauthenticationonmobiledevices:Recentprogressandremaining challenges. IEEESignalProcessingMagazine,33(4):49–61,July2016. [19] D.Ramanan.Facedetection,poseestimation,andlandmarklocalization inthewild. InProceedingsofthe2012IEEEConferenceonComputer VisionandPatternRecognition(CVPR),CVPR’12,pages2879–2886, Washington,DC,USA,2012.IEEEComputerSociety. [20] R.Ranjan,V.M.Patel,andR.Chellappa. Adeeppyramiddeformable partmodelforfacedetection. InBiometricsTheory,Applicationsand Systems(BTAS),2015IEEE7thInt.Conf.,pages1–8,2015. [21] R.Ranjan,V.M.Patel,andR.Chellappa. Hyperface:Adeepmulti- tasklearningframeworkforfacedetection,landmarklocalization,pose estimation,andgenderrecognition. CoRR,abs/1603.01249,2016. [22] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-timeobjectdetectionwithregionproposalnetworks. InAdvances inNeuralInformationProcessingSystems(NIPS),2015. [23] T.Sakai,T.Nagao,andT.Kanade.Computeranalysisandclassification ofphotographsofhumanfaces. Seminar,pages55–62,October1973. [24] P. Samangouei and R. Chellappa. Convolutional neural networks forfacialattribute-basedactiveauthenticationonmobiledevices. In International Conference on Biometrics Theory, Applications and Systems(BTAS),Arlington,VA,Sept2016. [25] S. Sarkar, V. M. Patel, and R. Chellappa. Deep feature-based face detectiononmobiledevices. ArXive-prints,abs/1602.04868,2016. [26] X.Shen,Z.Lin,J.Brandt,andY.Wu. Detectingandaligningfacesby imageretrieval. InComputerVisionandPatternRecognition(CVPR), 2013IEEEConferenceon,pages3460–3467,June2013. [27] C. C. L. Shuo Yang, Ping Luo and X. Tang. From facial parts