Internship Projects/Mentors
Description
This work explores the possibility and effectiveness of NLP techniques for anonymizing Telco Data (logs). The goal is to answer the following questions?
- Are there sufficient and usable dataset available in the public domain to carry out this work?
- What are the current techniques that are used for anonymizing log-data?
- Do NLP-based techniques provide any efficiency (compared against existing techniques).?
- Are available libraries and tools available (Ex: presidio) sufficient?
- What types of log-data are applicable for NLP-based techniques?
- Does anonymizing log-data affect (ex: Predictability power, detection accuracy, etc.) any of the ML-techniques ?
Apart from answering the above questions, the outcome of this work also includes a tool that will take log-data, and anonymize it using an NLP-based approach.
Additional Information
Due to the request being part time, Toth is teaming with the general Anuket project so that the idea would be that the intern would be able to work on both projects.
Learning Objectives
Working on this project will help the Student to:
1. Understand the Telco-Data, and the need for anonymization.
2. Understand different techniques and methodologies of anonymization
3. Master the use of NLP for anonymization
Expected Outcome
A tool that takes original data and outputs anonymized data.
Relation to LF Networking
Anuket, Thoth
Education Level
Undergrad (BE)
Skills
- Python
- Basics of Data Analytics and ML.
- Basics of NLP.
Future plans
This tool will get merged with other anonymization techniques.
Preferred Hours and Length of Internship
3 Months Part-Time or 1.5 Months Full-Time (½ of the LF Mentorship Program duration).
Mentor(s) Names and Contact Info
Click here to apply
Sridhar Rao, srao@linuxfoundation.org, sridharkn, The Linux Foundation.