Internship Projects/Mentors
Description
This work explores the possibility and effectiveness of NLP techniques for anonymizing Telco Data (logs). The goal is to answer the following questions?
- Are there sufficient and usable dataset available in the public domain to carry out this work?
- What are the current techniques that are used for anonymizing log-data?
- Do NLP-based techniques provide any efficiency (compared against existing techniques).?
- Are available libraries and tools available (Ex: presidio) sufficient?
- What types of log-data are applicable for NLP-based techniques?
- Does anonymizing log-data affect (ex: Predictability power, detection accuracy, etc.) any of the ML-techniques ?
Apart from answering the above questions, the outcome of this work also includes a tool that will take log-data, and anonymize it using an NLP-based approach.
Additional Information
Learning Objectives
Expected Outcome
Relation to LF Networking
Anuket, Thoth
Education Level
Skills
- Python
- Basics of Data Analytics and ML.
- Basics of NLP.
Future plans
Preferred Hours and Length of Internship
3 Months Part-Time or 1.5 Months Full-Time (½ of the LF Mentorship Program duration).
Mentor(s) Names and Contact Info
Sridhar Rao, srao@linuxfoundation.org