Internship Projects/Mentors
Description
This work explores the possibility and effectiveness of NLP techniques for anonymizing Telco Data (logs). The goal is to answer the following questions?
- Are there sufficient and usable dataset available in the public domain to carry out this work?
- What are the current techniques that are used for anonymizing log-data?
- Do NLP-based techniques provide any efficiency (compared against existing techniques).?
- Are available libraries and tools available (Ex: presidio) sufficient?
- What types of log-data are applicable for NLP-based techniques?
- Does anonymizing log-data affect (ex: Predictability power, detection accuracy, etc.) any of the ML-techniques ?
Apart from answering the above questions, the outcome of this work also includes a tool that will take log-data, and anonymize it using an NLP-based approach.
Additional Information
Due to the request being part time, Toth is teaming with the general Anuket project so that the idea would be that the intern would be able to work on both projects.
Learning Objectives
Expected Outcome
Relation to LF Networking
Anuket, Thoth
Education Level
Skills
- Python
- Basics of Data Analytics and ML.
- Basics of NLP.
Future plans
Preferred Hours and Length of Internship
3 Months Part-Time or 1.5 Months Full-Time (½ of the LF Mentorship Program duration).
Mentor(s) Names and Contact Info
Sridhar Rao, srao@linuxfoundation.org