Showing posts from 2018

AI Expo 2019: Tim Jurka (LinkedIn, Director Feed AI) - Part 4 of 4

I recently attended the AI Expo 2019 at the Santa Clara Convention Center. Notes are from my understanding of the talk. Any errors are mine and mine alone. LinkedIn: A look behind the AI that powers the LI feed Tim Jurka (Dir. Feed AI) The talk was focused on the objectives of LinkedIn's Feed. The talk was focused to a high level (exec) audience. While I was familiar with the space, the objective function formulation and presentation was interesting: The recommendation problem for LinkedIn is maximizing Like/Comment/Share CTR + downstream network activation (virals) + encouraging new creators. Problem Formulation: P(click) + P(viral) * (alpha_downstream + alpha_creator * e ^ (- decay * E[num_response_to_creator]) alpha_downstream accounts for downstream effects; alpha_creator penalizes popular creators to induce diversity. General approaches (Toolbox): Multi Objective Optimization (ads vs organic content). Logistic Regression: Features, Embedding

AI Expo 2019: Emilio Billi (CTO, A3Cube) - Part 3 of 4

Why and How the computational power influences the rate of progress in the technology Emilio Billi CTO A3Cube Inc Background: ML, Big Data & Analytics, AI, HPC. This was a big data infra focused talk. The speaker had a background in systems infra with past DoD experience. Not the most engaging delivery, but really nice takeaways: Moving 128 bytes on a CPU using 100Gbit ETH: CPU waits 8900ns for nothing (~7.1M compute ops lost); Moving the same 128 bytes using optimized RDMA intra-cluster costs 1200ns CPU time (~0.96M compute ops lost) You get 6M ops extra per second for ML. That's a great acceleration for ML workloads. Basic contention: ETH, TCP, slow storage is legacy technology. The clusters of the future will look like the supercomputer systems of today: 1. Low latency converged parallel file systems (think S3 for the cluster). 2. Built in Distributed Resource scheduler (think Kubernetes for the cluster). 3. Cooperative RAM over networ

AI Expo 2019 - Prakhar Mehrotra (Walmart Sr. Director of ML) - Part 2 of 4

I attended the AI Expo 2019 at the Santa Clara Convention Center where Prakhar gave a talk. Notes are my summarization of the talk. Any errors are mine and mine alone.  Walmart - Prakhar Mehrotra (Sr. Director of ML, previously at Uber) Walmart has huge scale: 0.5Trillion+ revenue, 3000+ stores with massive physical footprints, a massive global supply chain,,,, and it keeps growing. The talk was focused on Walmart's application of ML, the contrasts of Uber-style surge pricing vs Walmart's fixed in-store pricing ("everyday low pricing"). A focus point was causality over correlation: understanding Walmart's customer and its supply chain (the Why?). Their primary domain was solving for shelf placement of inventory. Other interesting problems were inventory management, bridging the online and offline worlds (if we ship from warehouse, it's going to cost you X but if you pick up at this store where it&#

AI Expo 2019 Notes: Ameen Kazerouni (Zappos) - Part 1 of 4

I recently attended the AI Expo 2019 at the Santa Clara Convention Center where there were talks on various ML platforms. I'm leading the ML Training Infra platform at Pinterest. These notes are from those talks and are summarized from the speaker's presentations. Any and all errors are mine alone. Hope you find the below useful. Speaker: Ameen Kazerouni (Zappos) The scope of the talk was Zappos' ML Platform ecosystem, the problems they faced after solving the basic 5: Problem specification, dataset design, model selection, training and validation. A condensed list of their issues is: 1. Data management: data lifetime (how long?), security footprint (who should have access?), governance issues (who did have access?), data scrubbing and anonymization (avoiding privacy issues under GDPR). 2. Team: There are very few unicorns that can do PhD statistics, ML math and write distributed systems. They hire for domain competence and the ability to communicate t

The toughest interview questions asked recently

During a recent round of interviews, I was asked these 5 questions that I found particularly interesting / challenging. They each covered a different interesting pieces of computer science that I thought worth sharing. Check them out and see what you think about them: 1. A set of overlapping ranges are provided in [start, end) format. Find the max number of ranges that overlap each given range. Improve your solution to O(N log N) complexity (basic solution is: O(N^2) complexity for a set of ascending ranges [1,4), [2, 4), [3, 4) etc.). 2. Implement a multithreaded producer / consumer queue using condition variables. 3. Implement a multithreaded rate limiter (token bucket with defined capacity) using no hard-coded poll durations and without a background thread for "filling". Discuss fairness vs head of line blocking tradeoffs of the implementation. 4. Implement a multithreaded scheduler that executes tasks repeatedly at specified time intervals which manages task overruns (tas

Great workplace habits

Wellness:  Maintaining a healthy body, mind, and spirit/mood. Self-presentation:  Controlling one’s grooming, attire, and manners—given the social and cultural situation at hand—so as to make a positive impression on others. Timeliness:  Arriving early, staying late, and taking short breaks. Meeting or beating schedules and deadlines. Productivity:  Working at a fast pace without significant interruptions. Organization:  Using proven systems for documentation and tracking—note taking, project plans, checklists, and filing. Attention to detail:  Following instructions, standard operating procedures, specifications, and staying focused and mindful in performing tasks and responsibilities. Follow-through and consistency:  Fulfilling your commitments and finishing what you start. Initiative:  Being a self-starter. Taking productive action without explicit direction. Going above and beyond; the extra mile. Found from:

Actionable Production Escalations

I've long considered the following items the basics of an actionable production escalation. These were taught to me by Googlers (mostly when I violated these understated values). The fundamentals of any production escalation require the documentation of the following from SREs: 1. An exception, call graph, logs or metrics showing the problem 2. A first pass characterization of the problem (what is it / how much impact) 3. Why me? (Do we need a PoC that you wouldn't know otherwise?)  4. What have you already tried.  5. Things that you have noted that are out of the ordinary. 6. How specifically can I help solve this problem? (Find a PoC? look at the code? Judge downstream impact? Validate severity?) Following the above process keeps a check on the level of due diligence needed before a Dev escalation. It also helps formulate concrete action items as part of the escalation process. I've found that this helps resolve issues quicker and keeps the prod overhead low for devs. Wha