In our latest round-up, we look at Jupyter Notebook and self-directed learning, an excellent quick guide to Excel from Microsoft, a terrifying data visualization, why software project plans can go awry and more. Here are interesting data news stories, articles and resources we found over the last 30 days or so.

 

Why software projects take longer than you think – a statistical model

This excellent study by Erik Bernhardsson considers why developers seem to get their time to completion estimates so wrong, so often. There are certain biases at play, but a key issue when it comes to making such projections is the difference between the median time to complete and the mean.

Software Project Blowup Factor

Bernhardsson argues that while estimates seem to often be way off the mark, software developers are generally good at estimating median project completion time based on experience, but if we’re talking about averages, then here’s a big problem – in theory and mathematically, the “mean time to complete a task we know nothing about is actually infinite”!

Here’s some further great takeaways (click here to read the full article):

  • The mean turns out to be substantially worse than the median, due to the distribution being skewed (log-normally).
  • When you add up the estimates for n tasks, things get even worse.
  • Tasks with the most uncertainty (rather the biggest size) can often dominate the mean time it takes to complete all tasks.

 

The Microsoft Quick Guide to Excel

Microsoft have created a handy quick guide to using Excel with 4 easy charts.

Microsoft Excel Quick Guide

From creating your first spreadsheet, to managing data, to analysis and visualization, to sharing and collaboration – new and existing MS Excel can learn a lot from this new page on the Microsoft website.

If you’re interested in improving your use of Excel and guarding against errors through formula visualizations, you might also be interested in EQUS – our handy Excel add-in.

 

Interview with Bradley Voytek – the “Grandfather of Data Science” at Uber

Now holding several prestigious academic roles at University of California San Diego, Bradley Voytek has also had a dramatic impact on the development of Uber.

He discusses his already packed career history in this interview with SuperDataScience:

 

Jupyter Notebooks & Self-Directed Learning

Jupyter Notebooks are similar to a textbook in terms of their content and initial layout, but with a layer of flexibility – offering access to live code, equations and data visualizations. This takes knowledge development from “this is how it is” to “let’s see how this might look”, giving learners the ability to modify code and run experiments, building from the starting base of information.

Jupyter Notebooks

Jupyter, and other types of Notebook, create interactive, asynchronous learning environments with malleable data.

Learn about about how these new tools are changing the world of Learning and Development on this article from Learning Solutions Mag.

 

Amazon Uses AI to Reduce Alexa Speech Recognition Errors By Up to 22%

Amazon is using huge data sets and ‘semi-supervised learning’ with artificial intelligence to make further improvements to Alexa Speech Recognition.

Amazon Alexa - AI driven speech recognition

The latest project, detailed on this VentureBeat article, uses 7,000 hours of labelled data and 1 million hours of unlabelled data for AI-driven improvements to what is already a leading example of automatic human speech recognition.

 

Countdown to Armageddon – Our CO2 Budget

CO2 Budget and the Global Warming Tipping Point

Fascinating but terrifying, the full CO2 emissions data visualization from Information is Beautiful looks at our potential ‘carbon dioxide budget’ and the much-discussed global warming tipping point…