ebook img

Implementation of Deduplication on Encrypted Big-data using Signcryption for cloud storage PDF

14 Pages·2017·0.49 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Implementation of Deduplication on Encrypted Big-data using Signcryption for cloud storage

International Journal of Pure and Applied Mathematics Volume 119 No. 12 2018, 13409-13421 ISSN:1314-3395(on-lineversion) url:http://www.ijpam.eu SpecialIssue ijpam.eu Implementation of Deduplication on Encrypted Big-data using Signcryption for cloud storage applications Saravanan Palani*, Sangeetha E, Archana A School of Computing, SASTRA Deemed University, India *Corresponding Author ABSTRACT applying this technique, identical plaintexts would turn out the same ciphertext. As Big Data Cloud storage servers are getting widespread the shortage of disc Keywords: Big data, digital signature, space within the cloud becomes a major cloud, data deduplication, convergent concern. The elimination of duplicate or encryption redundant data, particularly in computer data 1. INTRODUCTION is named deduplication. Data deduplication is a method to regulate the explosive growth Cloud Storage is an online storage service of information within the cloud storage, which provides services such as data most of the storage providers are finding maintenance, data management, and data more secure and efficient methods for their backup. The user is permitted to store their sensitive method. file online and can access the stored files from anywhere. A recent survey states that Recently, a noteworthy technique referred to some business organizations gained an as signcryption has been proposed, in which advantage due to cloud adoption is nearly both the properties of signature (ownership) double the previously shown records, and it and encryption are simultaneously is estimated that by 2017, the cloud services implemented with better performance business may exceed $244 billion. According to deduplication, we introduce a Some of the most commonly known cloud method that can eliminate redundant services are Dropbox, Google drive, etc. encrypted data owned by different users. Stored files may be accessed from anyplace Furthermore, we generate a tag which will via net association. Each cloud service be the key component of big data allows a specific bandwidth. If a corporation management. We propose a technique called exceeds the given bandwidth, extra charges digital signature for ownership verification. would be applied. Some suppliers permit Convergent encryption also called for a unlimited information storage. It is one content hash key cryptosystem. Convergent among the issue that corporations ought to encryption is an encryption approach that take into account when looking at a cloud supports deduplication. With this encryption storage provider [15-24]. technique, the encryption key is generated out of a hash of plain text. Therefore 13409 International Journal of Pure and Applied Mathematics Special Issue Privacy is a major concerned factor one has server eliminates duplicated ciphertexts to look for. Cloud encryption is the which ultimately improves the privacy conversion of a cloud user’s data into protection. In this model, the security of the ciphertext [25-32]. The Cloud storage ciphertext is improved in a better way providers provide services like cloud compared with the existing encryption encryption which is an encryption of the models. Usually, the encryption keys are user's data before stored in the cloud [33- hash values of plain text, but here the 39]. Encryption is the scrambling of user’s encryption keys are randomly chosen. Here data into a form such that it is impossible to the cloud server has no information except decrypt the data without the the hash value. acknowledgment of the cryptography key. DupLESS: Server aided encryption for Data security can be effectively achieved by deduplicated Storage [3]. They have encryption. Encryption and secure proposed an architecture system called encryption key management allow only DupLESS in such a way that the proposed authorized users to access the uploaded data. architecture can resist the brute force attacks The encrypted data is meaningless without through secure deduplicated storage. The its respective key. messages were encrypted via PRF protocol using the key which was obtained from the 2. RELATED WORK server.The duplication check was done by an existing service on client behalf.The main Hybrid data deduplication in cloud advantage of these techniques is that brute environment [1].They propose a mechanism force attacks are avoided, and clients can called hybrid data deduplication. The encrypt their data with a key server which is solution for the mechanism mentioned different from the separate storage server. above is provided with partial semantics The main drawback of this technique is that security. In this system, CSP knows the flexibility to other data users cannot be encryption key of the data. This system provided model is not applicable where the Cloud storage supplier cannot be trusted. The main Policy-based de-duplication model was advantage of this technique is that it proposed based [4] on trust relationship supports deduplication on plaintext and enabled on cloud storage components, de- ciphertext. The main disadvantage of this duplication related components, and technique is that it does not support different security requirements. The encrypted data deduplication. In this proposed system model has a key scheme, they use client-end de-duplication, management mechanism based on proxy re- and the encryption is done at the client. In encryption algorithm for decryption of this way, the computation pressure can be deduplicated data and data access. Even if shared by all the clients. the malicious party tries to access the data store in the cloud, it can’t gain knowledge Encrypted Data Deduplication in Cloud about Map box and the lockbox decryption Storage [2]. The proposed model had cloud 13410 International Journal of Pure and Applied Mathematics Special Issue private key. In case the chunk key is known, security’s standard notion for data it may result in information leak. confidentiality was achieved. Key management is another major challenge in Efficient Deduplication On encrypted Big cloud storage. In [8], security matrix and Data in Cloud [5]. They have presented an magic cards are used to identify right binary efficient scheme, where the data was pattern from the input stream. encrypted using proxy re-encryption technique and possession undertaking, then A Hybrid Cloud Approach for Secure stored in the cloud.The outcomes display Authorized Deduplication [9]. The scheme effectiveness and good performance authorized data deduplication issue was first in deduplication of cloud storage data. addressed in this paper. Other than the data, user’s different privileges were considered Deduplication based on Hadoop [6].In this for duplication check. The proposed model paper, an integrated deduplication approach was a hybrid cloud architecture which is proposed by taking the features of Hadoop supported authorized duplication check [10]. into account and leveraging parallelism The proposed scheme had minimal based on Map Reduce and HBase to speed overhead, reduced storage space and saved up the deduplication procedure. In our network bandwidth. Key management proposed approach, a new small-file overhead issue was not addressed in the aggregation scheme is proposed, and the paper. new standard of Secure Hash Algorithm-3, Keccak is employed. The overall A Secure Client Side Deduplication Scheme deduplication procedure is implemented in Cloud Storage Environments [11]. based on Map Reduce and HBase to provide OpenStack Swift was proposed for secure scalability to cater to the challenges brought storage and data sharing via the public by the big data Era. cloud. It is client-side deduplication. Unauthorized user confidentiality was better When there are too many files to process, ensured. That is the user itself computed the the tar-to-seq tool's performance may key required for the respective data sharply downgrade due to performance encryption [12]. The data owner managed bottleneck arising from memory or I/O in a the data access. An authorized user can single node. decrypt a file in its encrypted form with the private key only after integrating access Attribute-Based Storage Supporting Secure rights. The proposed model was successful Deduplication of Encrypted Data in Cloud only for hash key encryption. Deduplication [7]. An attribute-based storage scheme was was also used in cloud storage auditing [13] proposed in a hybrid cloud setting for secure and improving the access mechanism with deduplication. The public cloud is higher efficiency [14]. responsible for storage. The private cloud manages the duplication check. Confidential The proposed model in the paper is done by data sharing was done efficiently by using convergent encryption and digital specifying access policies. Semantic signature verification for deduplication on 13411 International Journal of Pure and Applied Mathematics Special Issue big encrypted data. Confidentiality of data 2) Cloud Storage Provider is used to store was better ensured. The proposed model the encrypted data. In this scheme, CSP is provided good performance in achieving only used for storage as it cannot be trusted deduplication on encrypted big data. fully. The trust issue is due to the curiosity of the cloud to know about the stored data. 3. PROPOSED APPROACH 3) The authorized party is used for data Popular service of the cloud is storage of ownership verification and duplication data. Respecting the privacy of the user, the checking. The data holders completely trust data will be encrypted first and then stored AP. in the cloud. New challenges arise for data deduplication in the cloud due to the 3.1. PRELIMINARY AND NOTATION encrypted data storage which leads to 3.1.1PRELIMINARIES difficulties in big data storage. 1)Convergent Encryption Here the convergent encryption technique is represented as (KG; E; D)Where KG is the key generation, E is the encryption, D is the decryption. On the input of the hash key K and the message M, the encryption algorithm E generates a ciphertext C where C=E (M, K). On the input of hash key K and the ciphertext C, the decryption algorithm Fig 1. Proposed model generates a plain text M where M=D(C, K) 2).Symmetric Encryption In this paper, the scheme is proposed by The encryption algorithm takes two applying CE based on digital signature parameters namely the plaintext M and the verification. In the proposed scheme, it is secret key K. Using these parameters, it not necessary for the data holders to be in encrypts the input M with the secret key S online for the duplication check. and generates the ciphertext CT (I.e.) E (M, K). This process is conducted at the The scheme has three entities. They are respective data owner U; then the encrypted data is uploaded to the cloud storage 1) Data holder who would upload the data provider CSP which will be saved in the cloud. In this scheme, it’s possible that the data holders The decryption algorithm takes two will try to upload the same data which is parameters namely the ciphertext CT and the already stored in the cloud storage in its secret key K. Using these parameters, it encrypted form by a data holder. decrypts the ciphertext CT with the secret 13412 International Journal of Pure and Applied Mathematics Special Issue key S and generates the plaintext M (I.e.) D 2) Data duplication- when a data holder (CT, K). This process is conducted to obtain tries to store a data which is already stored the encrypted data stored in the cloud in the in the cloud storage, duplication occurs. AP form of plain text does the duplication check using tag comparison. If the comparison results in 3.1.2 NOTATION positive, AP contacts the data holder and informs about the duplication occurred Table 1 represents the list of parameters used in the proposed model 3) Data download- If the data holder wants to decrypt a particular data stored at Key Description M The plaintext CSP, it will request AP to provide the key CT The ciphertext needed to decrypt the respective data. AP H() The hash function KG The Key generation goes for data holder eligibility check. If the algorithm check results in positive, then AP will issue K The hash key the secret key that will be used to decrypt T() The tag generation function the encrypted data by the respective data T The tag used for data holder duplication check S The secret key (PK1, Pk2) The public key and 4.1DATA OWNERSHIP VERIFICATION private key for SCHEME ownership verification E The encryption In this paper, the data ownership verification D The decryption SIG The digital signature is done based on Rivest-Shamir-Adleman HD The hashed document scheme. AP checks data holder’s eligibility Encrypt(M,S) The encryption function to make sure that it is a real party which will on M using secret key S Decrypt(CT.S) The decryption function upload the data. Here H (M) cannot be on CT disclosed by CSP. CS(HD,PK2) The signature creation function VS(HD,SIG,PK1) The signature 4.2 PROCEDURE verification function 4.2.1 Data deduplication 4. SCHEME The procedure for data deduplication is given by the following steps: The proposed model has the following aspects Step 1) user U1 selects the data M it wants to upload 1) Data upload- AP does the duplication check. If the check results in Step 2) Tag generation for the negative, then the data holder encrypts the particular data is done by user U1. It will data with the secret key S given by AP. The generate a tag for the particular data M, t1= encrypted data is then stored at CSP H (E (H (M), M)) 13413 International Journal of Pure and Applied Mathematics Special Issue Step 3) AP will do the duplication check. It will check whether the tag t1 already exist If the check results in negative, then the data holder will encrypt the data M with the secret key S given by AP. The encrypted data is then stored at CSP. If the check results in positive, then AP contacts the data holder and informs about the duplication occurred Step 4) When User U2 request Fig 2. Data ownership verification to upload the same data M encrypted by the user U1 following step 1 and 2. In step 3, duplication happens because t2 exists so that U2 will be informed about the situation. Now, the digital signature is created for the file using the hashed document and the 4.2.2 Ownership verification private key. AP does the verification of the signature using its respective public key. If Step 1) the public key and the private key the verification is true, the AP proceeds to (PK1, PK2) is generated for each message check for duplication which has to be uploaded by the data holder. Step 2) Signature creation: The signature is created on the input of hashed message (HD) using private key PK2 Step 3) AP verifies the signature on the input of hashed message HD, signature SIG using public key PK1 Step 4) if the verification is positive, AP proceeds to check for duplication. If the verification is negative AP will conclude it as an attempt by a malicious party. Fig 3. Data deduplication 4.2.3 Data ownership verification 4.2.4 Data deduplication: The data owner logs into its account. It will The data owner logs into its account and select the document that it wants to upload selects the document that it wants to upload to the cloud. Then, the document is hashed. to the cloud. A digital signature is created Public and private key pair is generated for for the particular document, and AP verifies the selected document. it. If the verification is true, then AP checks 13414 International Journal of Pure and Applied Mathematics Special Issue for duplication. The process is done by 1) deriving a hash key from the selected document. 2) The ciphertext of the particular document is done by encryption of the selected document and the hash key. 3) A tag is generated from the corresponding ciphertext created. 4) The created tag is compared with the other tags present in AP if the comparison is false, then it is uploaded to the cloud Fig 5. Data Decryption It will request AP to share the secret key for the selected document by sending its digital signature for verification. AP verifies the digital signature. If the verification is true, then AP shares the secret key to the respective data holder. Now the data holder can decrypt the selected document using the Fig 4. Data Encryption respective secret key. 5. PERFORMANCE EVALUATION 4.2.5 Data Encryption: The proposed model was implemented and The data owner logs into its account and tested for performance evaluation. The time selects the document that it wants to upload taken for selection of data to be uploaded to the cloud. A digital signature is created and downloaded were not taken into for the particular document, and AP verifies consideration. Testing was mainly focused it. If the verification is true, then AP checks on algorithms designed and deduplication for duplication. If the comparison is false, procedure performance the AP generates a secret key. The data owner encrypts the selected document with 5.1 Data Encryption and Decryption: the secret key and then uploads it to the Data encryption and decryption were done cloud. by using AES. The operation time was estimated using AES key size 256 bits for 4.2.6 Data decryption different data size (range from 80MB to A data holder logs into its account, it will 150MB). It was observed that the time taken select the document that it wants to decrypt. for the data which was as big as 150MB to encrypt and decrypt were 64 milliseconds and 60 milliseconds respectively. 13415 International Journal of Pure and Applied Mathematics Special Issue 5.3 Data ownership verification: Fig 6. Data Encryption and decryption Fig 8. Data ownership verification As the size of the data increased, the time is For each step in data ownership verification, taken for data encryption and decryption the operation time was estimated using 2048 also increased. Data encryption and bit RSA. For a data of 100MB, the time is decryption were efficiently done by using taken for key generation, signature creation AES. and signature verification were observed. 5.2 Convergent Encryption: The time taken for a key generation was 11milli second. The time taken for creation of signature was 16 milliseconds. The time taken for verification of signature was 15 milliseconds. In the proposed model, the data holder didn’t need to proceed for data encryption if that data was already uploaded by another data holder References Fig 7. Convergent Encryption [1]. C. Fan, S. Y. Huang, and. C. Hsu, Hybrid data deduplication in a cloud Each operation of convergent encryption environment, in Proc. Int. Conf. Inf. was tested under different AES key size Secure. Intell. Control, 2012, pp. 128bits, 196bits, and 256bits. For a data of 174177 100MB, the time is taken for key generation, encryption, tag generation, decryption were [2]. Encrypted Data Deduplication in observed. For the different AES key size, Cloud Storage, Information Security the encryption time was less than 44 (AsiaJCIS), 2015 10th Asia Joint milliseconds. The decryption time was less Conference on, Issue Date: 24-26 May than 41 milliseconds. For each operation, the 2015 computation time did not have much difference. The proposed model was [3]. M. Bellare, S. Keelveedhi, and T. efficiently applicable in various scenarios. Ristenpart, “DupLESS: Server aided 13416 International Journal of Pure and Applied Mathematics Special Issue encryption for deduplicated Storage,” Transactions On Parallel And in Proc. 22nd USENIX Conf. Secur., Distributed Systems, Vol. 26, No. 5, 2013, pp. 179–194. May 2015 [4]. Chuanyi Liu, Xiaojian Liu and Lei [10]. C. Fan, S. Y. Huang, and W. C. Hsu, Wan “Policy-based deduplication in “Hybrid data deduplication in cloud secure cloud storage,” in Proc. environment,” in Proc. Int. Conf. Inf. Trustworthy Compute. Serv., 2013, pp. Secur. Intell. Control, 2012, pp. 174– 250–262, doi: 10.1007/978-3-642- 177, doi:10.1109/ISIC.2012.6449734. 35795-4_32. [11]. Nesrine Kaaniche, Maryline Laurent [5]. Y.Saritha, A. Vineela, "Efficient “A Secure Client Side Deduplication Deduplication On encrypted Big Data Scheme in Cloud Storage in Cloud”, International Journal of Environments” IEEE Transactions on emerging trends and technology in Mobility and Security (NTMS) in computer science, vol.6, no.4, 2017. Cloud Computing, Issue Date:April.2.2014 [6]. Zhang, D., Liao, C., Yan, W., Tao, R., & Zheng, W. (2017, August). Data [12]. C. Yang, J. Ren, and J. F. Ma, Deduplication Based on Hadoop. In “Provable ownership of file in Advanced Cloud and Big Data (CBD), deduplication cloud storage,” in Proc. 2017 Fifth International Conference on IEEE Global Commun. Conf, (pp. 147-152). IEEE. 2013,pp.695700,doi:10.1109/ GLOCOM. 2013.6831153. [7]. Hui Cui, Robert H. Deng, Yingjiu Li, and Guowei Wu “Attribute Based [13]. J. W. Yuan and S. C. Yu, “Secure and Storage supporting secure constant cost public cloud storage deduplication of encrypted data in auditing with deduplication,” in Proc. cloud” IEEE Transactions on Big IEEE Int. Conf. Communic. Netw. Data, January 2017 Secur., 2013, pp. 145–153, doi:10.1109/CNS.2013. 6682702. [8]. Sekar, K. R., Saravanan, P., & Sethuraman, J. (2006). Discovery of [14]. T. Y. Wu, J. S. Pan, and C. F. Lin, right binary pattern in key “Improving accessing efficiency of management using security matrix and cloud storage using de-duplication and magic cards. ARPN Journal of feedback schemes,”IEEE Syst. J., vol. Engineering and Applied Sciences, 8, no. 1, pp. 208–218, Mar. 2014, 11(15), pp. 9187-9194 doi:10.1109/ JSYST.2013.2256715. [9]. Jin Li, Yan Kit Li, Xiaofeng Chen, [15] Indragandhi, V., Logesh, R., Patrick P.C. Lee, and Wenjing Lou “A Subramaniyaswamy, V., Vijayakumar, Hybrid Cloud Approach for Secure V., Siarry, P., & Uden, L. (2018). Authorized Deduplication” IEEE Multi-objective optimization and 13417 International Journal of Pure and Applied Mathematics Special Issue energy management in renewable electroencephalography feedback. based AC/DC microgrid. Computers & Biomedical Research, 28(13), 5646- Electrical Engineering. 5650. [16] Subramaniyaswamy, V., Manogaran, [21] Arunkumar, S., Subramaniyaswamy, G., Logesh, R., Vijayakumar, V., V., Karthikeyan, B., Saravanan, P., & Chilamkurti, N., Malathi, D., & Logesh, R. (2018). Meta-data based Senthilselvan, N. (2018). An ontology- secret image sharing application for driven personalized food different sized biomedical images. recommendation in IoT-based Biomedical Research,29. healthcare system. The Journal of [22] Vairavasundaram, S., Varadharajan, Supercomputing, 1-33. V., Vairavasundaram, I., & Ravi, L. [17] Arunkumar, S., Subramaniyaswamy, (2015). Data mining‐based tag V., & Logesh, R. (2018). Hybrid recommendation system: an overview. Transform based Adaptive Wiley Interdisciplinary Reviews: Data Steganography Scheme using Support Mining and Knowledge Discovery, Vector Machine for Cloud Storage. 5(3), 87-112. Cluster Computing. [23] Logesh, R., Subramaniyaswamy, V., & [18] Indragandhi, V., Subramaniyaswamy, Vijayakumar, V. (2018). A V., & Logesh, R. (2017). Resources, personalised travel recommender configurations, and soft computing system utilising social network profile techniques for power management and and accurate GPS data. Electronic control of PV/wind hybrid system. Government, an International Journal, Renewable and Sustainable Energy 14(1), 90-113. Reviews, 69, 129-143. [24] Vijayakumar, V., Subramaniyaswamy, [19] Ravi, L., & Vairavasundaram, S. V., Logesh, R., & Sivapathi, A. (2016). A collaborative location based (2018). Effective Knowledge Based travel recommendation system through Recommeder System for Tailored enhanced rating prediction for the Multiple Point of Interest group of users. Computational Recommendation. International intelligence and neuroscience, 2016, Journal of Web Portals. Article ID: 1291358. [25] Subramaniyaswamy, V., Logesh, R., & [20] Logesh, R., Subramaniyaswamy, V., Indragandhi, V. (2018). Intelligent Malathi, D., Senthilselvan, N., sports commentary recommendation Sasikumar, A., & Saravanan, P. system for individual cricket players. (2017). Dynamic particle swarm International Journal of Advanced optimization for personalized Intelligence Paradigms, 10(1-2), 103- recommender system based on 117. 13418

Description:
School of Computing, SASTRA Deemed University, India technique, the encryption key is generated Stored files may be accessed from anyplace . Data download- If the data holder . Renewable and Sustainable Energy.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.