On this website we are presenting our progress for the 2021 course Audio Processing and indexing for the Leiden Institute of Advanced Computation. Repository
These are the final samples for our project. We generated these by using the different model snapshots that we recorded over a week of training. The snapshots were taken at 21k, 30k and 70k steps. Please note that these are not the 'raw' outputs. Our dataset was slowed down (130 -> 120 bpm) and these samples are sped up (120 -> 130bpm)
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
We also updated our similarity table to reflect our new data.
Dself | Dtrain | |
Train | 0 | 1.07 +- 0.29 |
Test | 1.07 +- 0.20 | 1.07 +- 0.20 |
21k | 6.87 +- 0.12 | 0.65 +- 0.77 |
29k | 6.87 +- 0.14 | 0.65 +- 0.94 |
72k | 6.87 +- 0.14 | 0.65 +- 0.10 |
Lastly, we surveyed a group of 16. The survey can be found here. Our results are presented in the table below.
Mean | St. dev | |
Train | 2.03 | 0.94 |
72k | 3.41 | 1.01 |
29k | 3.49 | 1.25 |
21k | 3.54 | 1.07 |
The goal of our paper was to train a GAN with Techno music, in order to acheave this we created an dataset which consists of techno music audio samples.
We trained our network for 30.000 steps and these are our results. Please note that these are not the 'raw' outputs. Our dataset was slowed down (130 -> 120 bpm) and these samples are sped up (120 -> 130bpm)
Sample 1
Sample 2
Sample 3
One of the ways we are measuring the fitness of our model is by comparing properties of our datasets. One of thease measures is the euclidean distance to K-Neareast Neighbor from the training set to a query set. By comparing thease measures we can make an estimate of the diversity (Dself) and similarity (Dtrain)
Dself | Dtrain | |
Train (real) | 1.06 +- 0.29 | 0 |
Test (real) | 1.08 +- 0.19 | 1.28 +- 0.23 |
Inference | 0.83 +- 0.82 | 2.50 +- 0.12 |
We started of with code from the original authors of the Audio Synthesis paper.
The author main focus was on a dataset with speaker recordings of the digits 0 through 9 (SC09). Since our focus lays on the generation of music samples we first tried the Generative Adverserial Network (GAN) on piano sounds.
Here are some samples we genarated along the way:
audio after 200 epochs
audio after 500 epochs
audio after 1000 epochs
In the future we hope to train the network on some audio recordings of techno music.