Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
In what I expect will be considered a seminal work on language models, a particularly interesting result the authors elucidate is that there exists an upper bound on the number of input tokens provided to a self-attention head for which exclusion from the reachable output set of desired outputs is unavoidable. This has interesting implications in settings where forgetfulness in the middle effects start coming into play; controllability, while achievable, comes not only at greater computational cost, but also with a risk that a self-attention head may select, but subsequently forget, pertinent information necessary to arrive at a desired output.Thanks for sharing David!
9
1 Comment
David Sauerwein
AI/ML at AWS | PhD in Quantum Physics
4d
- Report this comment
Glad you find it useful. I hope this gets picked up by many researchers. I think lots of room to fins interesting results that could work for "post tranformer" models too.
1Reaction
To view or add a comment, sign in
More Relevant Posts
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
My talented better half published her latest podcast episode about sustainable fashion! I'm so proud 💙💛 #fashionpsychology #startup #entrepreneurship
5
Like CommentTo view or add a comment, sign in
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
I'm honoured to have been selected to take part in Orbition Group's Driven by Data Mentorship Program!I look forward to taking part, meeting like minded data professionals and sharing in what will no doubt be some very thought-provoking conversations!
15
2 Comments
Like CommentTo view or add a comment, sign in
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
In the wake of increasingly wide adoption of large language models, you may ask how such models generate their responses? In neural language generation, there turn out to be a number of distinct ways to go from computing the softmax score that a given token from a vocabulary set is the next token, to selecting which token actually is produced. Each of these have trade-offs, typically between how expensive the selection process is computationally, how fast it is, and the quality of the text produced.Thank you Damien for sharing!
1
Like CommentTo view or add a comment, sign in
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
In the realm of language modeling, NuMind's approach to building task-specialized foundation models for structured entity extraction stands out. Their focus on incremental improvement (as evidenced by their work on the NuNER model family for named entity extraction tasks), and quality in creating performant, yet compact models (with their largest model comparable to GPT-4o with roughly 1/10 fewer parameters) is truly commendable. NuMind's latest blog post on NuExtract, their foundation model for structured extraction, offers valuable insights into their approach to designing a language model from first principles. Dive into the details here:#NuMind https://lnkd.in/eWfE9Zk9
11
Like CommentTo view or add a comment, sign in
-
-
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
A lovely idea brought to fruition by lovely people! I'm looking forward to it Kinsey and Dan!
3
2 Comments
Like CommentTo view or add a comment, sign in
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
My multi-talented better half is going to be on the telly! Please listen in to what's going to be a lovely discussion on the social impacts of clothing May 23rd, 7 pm ECT!
3
Like CommentTo view or add a comment, sign in
-
Daniel Laufer
Physicist | Senior Consultant @ EY | NLP and People Data Scientist | Machine Learning and Software Engineer
- Report this post
Indeed a very important caveat to AI systems: if you haven't evaluated the efficacy of your model outputs, it's ill-advised to place trust in them. Thank you for keeping us honest Chip Huyen!
4
Like CommentTo view or add a comment, sign in
2,050 followers
- 41 Posts
View Profile
FollowExplore topics
- Sales
- Marketing
- Business Administration
- HR Management
- Content Management
- Engineering
- Soft Skills
- See All