Use of NLP for Telco Data (logs) Anonymization
Internship Projects/Mentors
Title | Use of NLP for Telco Data (logs) Anonymization |
Status | Applications closed |
Difficulty | Medium |
Description
This work explores the possibility and effectiveness of NLP techniques for anonymizing Telco Data (logs). The goal is to answer the following questions?
Are there sufficient and usable dataset available in the public domain to carry out this work?
What are the current techniques that are used for anonymizing log-data?
Do NLP-based techniques provide any efficiency (compared against existing techniques).?
Are available libraries and tools available (Ex: presidio) sufficient?
What types of log-data are applicable for NLP-based techniques?
Does anonymizing log-data affect (ex: Predictability power, detection accuracy, etc.) any of the ML-techniques ?
Apart from answering the above questions, the outcome of this work also includes a tool that will take log-data, and anonymize it using an NLP-based approach.
Additional Information
Due to the request being part time, Toth is teaming with the general Anuket project so that the idea would be that the intern would be able to work on both projects.
Learning Objectives
Working on this project will help the Student to:
1. Understand the Telco-Data, and the need for anonymization.
2. Understand different techniques and methodologies of anonymization
3. Master the use of NLP for anonymization
Expected Outcome
A tool that takes original data and outputs anonymized data.
Relation to LF Networking
Anuket, Thoth
Education Level
Undergrad (BE)
Skills
Python
Basics of Data Analytics and ML.
Basics of NLP.
Future plans
This tool will get merged with other anonymization techniques.
Preferred Hours and Length of Internship
3 Months Part-Time or 1.5 Months Full-Time (½ of the LF Mentorship Program duration).
Mentor(s) Names and Contact Info
Mentor: @Sridhar Rao srao@linuxfoundation.org, sridharkn, The Linux Foundation.
To apply, please do the following:
Send an email to the following:
Include your name, resume, and a statement of why you would be best for this project.
Due to the volume of applications, we may not respond until up to April 23rd.
Please be patient.