• 0 Posts
  • 47 Comments
Joined 1 year ago
cake
Cake day: August 25th, 2023

help-circle


  • lol. Did this in my old building - the dryer was on an improperly rated circuit and the breaker would trip half the time, eating my money and leaving wet clothes.

    It was one of the old, “insert coin, push metal chute in” types. Turns out you could bend a coat hanger and fish it through a hole in the back to engage the lever that the push-mechanism was supposed to engage. Showed everyone in the building.

    The landlord came by the building a month later and asked why there was no money in the machines, I told him “we all started going to the laundromat down the street because it was cheaper”


  • These people aren’t placing bets on who they want to win, they are placing bets where the house odds differ from the actual expected outcome. The people throwing big money on this are doing it based on actual data (amalgamating polls, etc), not just gut feelings.

    If I think Kamala has a 45% chance of winning the election and the bookie is giving her implied odds of 40%, I should take that bet, because even though I think she will lose, I stand to make a 12.5% ROI on my bet. I can then hedge that bet on another bookmaker giving a 48% implied odds, and if enough people do this the bookmakers odds will converge on 44%


  • but either way I don’t think this “market” knew more than the mainstream media was telling us.

    No, but it is a culmination of all the available public information (and some private information you won’t find elsewhere) in a single metric. If you read a single article you would assume there is either a 100% Biden drops out or a 0% chance - if you read every single news article in existence, aggregated all social media buzz, polls, etc, into a statistical likelihood, you would likely come out with a number that closely matches the odds.

    Biden was only going to drop out once, so you can’t say how closely these odds matched the actual likelihood on this specific measure, but if you analyze hundreds of predictive markets like this, the implied odds pretty strongly correlate with the actual binomial outcomes







  • There are like 10,000 different solutions, but I would just recommend using what’s built in to python

    If you have multiple versions installed you should be able to call python3.12 to use 3.12, etc

    Best practice is to use a different virtual environment for every project, which is basically a copy of an existing installed python version with its own packages folder. Calling pip with the system python installs it for the entire OS. Calling it with sudo puts the packages in a separate package directory reserved for the operating system and can create conflicts and break stuff (as far as I remember, this could have changed in recent versions)

    Make a virtual environment with python3.13 -m venv venv the 2nd one is the directory name. Instead of calling the system python, call the executable at venv/bin/python3

    If you do source venv/bin/activate it will temporarily replace all your bash commands to point to the executables in your venv instead of the system python install (for pip, etc). deactivate to revert. IDEs should detect the virtual environment in your project folder and automatically activate it


  • Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.

    no one at reddit is going to hunt these sophisticated bots because they inflate numbers

    You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and other internal decisions they have a vested financial interest in making sure they are focusing on organic users - how humans interact with humans and/or bots is meaningful data, how bots interact with other bots is not



  • To compare every comment on reddit to every other comment in reddit’s entire history would require an index

    You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads? A cursory glance at their engineering blog indicates they perform much more computationally demanding tasks on comment data already for purposes of content filtering

    you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much

    Analytics workflows are never run on the production database, always on read replicas which are taken asynchronously and built from the transaction logs so as not to affect production database read/write performance

    Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.

    Reddit’s entire monetization strategy is collecting user data and selling it to advertisers - It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement